@@ -12,29 +12,31 @@ Querying Your Data Lake
1212 :depth: 2
1313 :class: singlecol
1414
15+ You can use the MongoDB Query Language (MQL) on {+adl+} to query and
16+ analyze data on your data store. {+adl+} supports most, but not all the
17+ standard server commands. To learn more about the supported and
18+ unsupported MongoDB server commands and aggregation pipleline stages,
19+ see :ref:`data-lake-mql-support`.
20+
1521You can run up to 30 simultaneous queries on your {+dl+} against:
1622
1723- Data in your |s3| bucket.
1824- Documents in your MongoDB |service| cluster.
1925- Data in files hosted at publicly accessible |url|\s.
2026
21- .. seealso ::
27+ .. see ::
2228
2329 - :doc:`How to Connect to Your Data Lake </tutorial/connect>`
24- - :doc:`How to Run Queries Against Your Data Lake </tutorial/run-queries>`
30+ - :doc:`How to Run Queries Against Your Data Lake
31+ </tutorial/run-queries>`
2532
2633.. _query-s3:
2734
2835Querying Data on S3
2936-------------------
3037
31- You can use {+adl+} to query and analyze data on your cloud object store
32- using MongoDB Query Language (MQL). {+adl+} supports most, but not all the
33- standard server commands. To learn more about the supported and unsupported
34- MongoDB server commands and aggregation pipleline stages, see
35- :ref:`data-lake-mql-support`.
36-
37- To query data on |s3|, your {+dl+} storage :ref:`configuration
38+ You can use {+adl+} to query and analyze data on your cloud object
39+ store. To query data on |s3|, your {+dl+} storage :ref:`configuration
3840<datalake-configuration-file>` must contain settings that define:
3941
4042- Your |s3| {+data-lake-store+}.
@@ -75,34 +77,37 @@ To query data on |s3|, your {+dl+} storage :ref:`configuration
7577 ]
7678 }
7779
78- To learn more about these settings, see :ref:`datalake-configuration-file`.
80+ To learn more about these settings, see
81+ :ref:`datalake-configuration-file`.
82+
83+ {+dl+} creates the virtual databases and collections you specified in
84+ your {+dl+} configuration for the data in your |s3| store. When you
85+ :doc:`connect </tutorial/connect>` to your {+dl+} and :doc:`run queries
86+ </tutorial/run-queries>`, {+dl+} processes your queries against the
87+ data and returns the query results.
7988
80- {+dl+} creates the virtual databases and collections you specified in your
81- {+dl+} configuration for the data in your |s3| store. When you :doc:`connect
82- </tutorial/connect>` to your {+dl+} and :doc:`run queries
83- </tutorial/run-queries>`, {+dl+} processes your queries against the data and
84- returns the query results .
89+ When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified
90+ an |s3| bucket with both read and write permissions or |aws| |s3|
91+ :aws:`s3:PutObject </AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
92+ permission, you can also save your query results in your |s3| bucket
93+ using :ref:`adl-out-stage` to |s3| .
8594
86- When :doc:`deploying </tutorial/deploy>` your {+dl+}, if you specified an |s3|
87- bucket with both read and write permissions or |aws| |s3| :aws:`s3:PutObject
88- </AmazonS3/latest/dev/using-with-s3-actions.html#using-with-s3-actions-related-to-objects>`
89- permission, you can also save your query results in your |s3| bucket using
90- :ref:`adl-out-stage` to |s3|.
95+ If you successfully create or update an object on your |s3| data store,
96+ {+dl+} returns the latest version of that object for any subsequent
97+ read requests and all list operations of the objects also reflect the
98+ changes. If your query contains multiple stages, each stage receives
99+ the most recent data available from the data store as that stage is
100+ processed.
91101
92102.. _query-atlas:
93103
94104Querying Data in Your |service| Cluster
95105---------------------------------------
96106
97- You can use {+adl+} to query and analyze data in your |service| cluster using
98- MongoDB Query Language (MQL). {+adl+} supports most, but not all the standard
99- server commands. To learn more about the supported and unsupported MongoDB
100- server commands and aggregation pipleline stages, see
101- :ref:`data-lake-mql-support`.
102-
103- To query data in your |service| cluster, your {+dl+} storage
104- :ref:`configuration <datalake-configuration-file>` must contain settings that
105- define:
107+ You can use {+adl+} to query and analyze data in your |service|
108+ cluster. To query data in your |service| cluster, your {+dl+} storage
109+ :ref:`configuration <datalake-configuration-file>` must contain
110+ settings that define:
106111
107112- Your |service| {+data-lake-store+}.
108113- {+dl+} virtual databases and collections that map to your
@@ -140,19 +145,21 @@ define:
140145 ]
141146 }
142147
143- To learn more about these settings, see :ref:`datalake-configuration-file`.
148+ To learn more about these settings, see
149+ :ref:`datalake-configuration-file`.
144150
145- {+dl+} automatically detects the file format and creates the virtual databases
146- and collections you specified in your {+dl+} configuration. When you
147- :doc:`connect </tutorial/connect>` to your {+dl+} and run queries, {+dl+}
148- processes your queries against the data and returns the query results.
151+ {+dl+} automatically detects the file format and creates the virtual
152+ databases and collections you specified in your {+dl+} configuration.
153+ When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
154+ queries, {+dl+} processes your queries against the data and returns the
155+ query results.
149156
150- If you query a collection in {+adl+} that is mapped to only one |service|
151- collection, {+adl+} acts as a proxy and forwards your query to |service|.
152- When acting as a proxy, {+adl+} doesn't scan data into its virtual collection
153- to proces the query thus improving performance and reducing cost. This
154- optimization is not available for queries on {+adl+} collections that are
155- mapped to multiple |service| collections.
157+ If you query a collection in {+adl+} that is mapped to only one
158+ |service| collection, {+adl+} acts as a proxy and forwards your query
159+ to |service|. When acting as a proxy, {+adl+} doesn't scan data into
160+ its virtual collection to proces the query thus improving performance
161+ and reducing cost. This optimization is not available for queries on
162+ {+adl+} collections that are mapped to multiple |service| collections.
156163
157164.. example::
158165
@@ -203,31 +210,37 @@ mapped to multiple |service| collections.
203210 ]
204211 }
205212
206- For the above storage configuration, {+adl+} acts as a proxy for queries
207- on ``foo`` collection and forwards the queries to |service|. This
208- performance and cost optimization is not available for queries on ``barbaz``
209- collection because ``barbaz`` is mapped to multiple |service| collections.
213+ For the above storage configuration, {+adl+} acts as a proxy for
214+ queries on ``foo`` collection and forwards the queries to |service|.
215+ This performance and cost optimization is not available for queries
216+ on ``barbaz`` collection because ``barbaz`` is mapped to multiple
217+ |service| collections.
218+
219+ You can also save your query results in your |service| cluster using
220+ :ref:`adl-out-stage` to |service|.
221+
222+ If you successfully create or update a document in your collection on
223+ the |service| cluster, {+dl+} returns the latest version of that
224+ document for any subsequent read requests and all list operations of
225+ the collection also reflect the changes. If your query contains
226+ multiple stages, each stage receives the most recent data available
227+ from the data store as that stage is processed.
210228
211229.. _query-http:
212230
213- Querying Data at a |http| |url|
214- -------------------------------
231+ Querying Data at a |http| or |https| |url|
232+ ------------------------------------------
215233
216234.. include:: /includes/extracts/fact-http-beta-message.rst
217235
218- You can use {+adl+} to query and analyze data in files hosted at publicly
219- accessible |url|\s using MongoDB Query Language (MQL). To learn more about the
220- supported data formats, see :ref:`data-lake-data-formats`. {+adl+} supports
221- most, but not all the standard server commands. To learn more about the
222- supported and unsupported MongoDB server commands and aggregation pipleline
223- stages, see :ref:`data-lake-mql-support`.
224-
225- To query data in your publicly accessible |url|\s, your {+dl+} storage
226- :ref:`configuration <datalake-configuration-file>` must contain settings that
227- define:
236+ You can use {+adl+} to query and analyze data in files hosted at
237+ publicly accessible |url|\s. To query data in your publicly accessible
238+ |url|\s, your {+dl+} storage :ref:`configuration
239+ <datalake-configuration-file>` must contain settings that define:
228240
229241- Your |http| {+data-lake-store+}.
230- - {+dl+} virtual databases and collections that map to your {+data-lake-store+}.
242+ - {+dl+} virtual databases and collections that map to your
243+ {+data-lake-store+}.
231244
232245.. example::
233246
@@ -263,27 +276,29 @@ define:
263276 ]
264277 }
265278
266- To learn more about these settings, see :ref:`datalake-configuration-file`.
279+ To learn more about these settings, see
280+ :ref:`datalake-configuration-file`.
267281
268- {+dl+} creates the virtual databases and collections you specified in your
269- {+dl+} configuration for the data in your |url|. {+dl+} also creates one
270- partition for each |url| in your collection. When you :doc:`connect
271- </tutorial/connect>` to your {+dl+} and run queries, {+dl+} processes your
272- queries against the data and returns the query results.
282+ {+dl+} creates the virtual databases and collections you specified in
283+ your {+dl+} configuration for the data in your |url|. {+dl+} also
284+ creates one partition for each |url| in your collection. When you
285+ :doc:`connect </tutorial/connect>` to your {+dl+} and run queries,
286+ {+dl+} processes your queries against the data and returns the query
287+ results.
273288
274289.. _federated-queries:
275290
276291Running Federated Queries
277292-------------------------
278293
279294You can use {+adl+} to query and analyze a unified view of data in your
280- |service| cluster, |s3| bucket, and at your |http| URL. For federated queries,
281- your {+dl+} storage :ref:`configuration <datalake-configuration-file>` must
282- contain the settings that define:
295+ |service| cluster, |s3| bucket, and at your |http| URL. For federated
296+ queries, your {+dl+} storage :ref:`configuration
297+ <datalake-configuration-file>` must contain the settings that define:
283298
284299- Your |s3|, |service|, and |http| {+data-lake-stores+}.
285- - {+dl+} virtual databases and collections that map to your |s3|, |service|,
286- and |http| {+data-lake-store+}\s.
300+ - {+dl+} virtual databases and collections that map to your |s3|,
301+ |service|, and |http| {+data-lake-store+}\s.
287302
288303.. example::
289304
@@ -342,12 +357,13 @@ contain the settings that define:
342357 ]
343358 }
344359
345- To learn more about these settings, see :ref:`datalake-configuration-file`.
360+ To learn more about these settings, see
361+ :ref:`datalake-configuration-file`.
346362
347- When you :doc:`connect </tutorial/connect>` to your {+dl+} and run federated
348- queries, {+dl+} combines data from your |service| cluster and |s3| bucket
349- in virtual databases and collections and returns a union of data in the
350- results.
363+ When you :doc:`connect </tutorial/connect>` to your {+dl+} and run
364+ federated queries, {+dl+} combines data from your |service| cluster,
365+ |s3| bucket, and |http| store in virtual databases and collections and
366+ returns a union of data in the results.
351367
352368.. toctree::
353369 :titlesonly:
0 commit comments