@@ -12,11 +12,11 @@ Overview
1212The MongoDB aggregation framework provides a means to calculate
1313aggregate values without having to use :doc: `map/reduce
1414</core/map-reduce>`. While map/reduce is powerful, using map/reduce is
15- more difficult than necessary for simple aggregation tasks, such as
15+ more difficult than necessary for many simple aggregation tasks, such as
1616totaling or averaging field values.
1717
1818If you're familiar with :term: `SQL `, the aggregation framework
19- provides similar functionality as "``GROUPBY ``" and related SQL
19+ provides similar functionality to "``GROUP BY ``" and related SQL
2020operators as well as simple forms of "self joins." Additionally, the
2121aggregation framework provides projection capabilities to reshape the
2222returned data. Using projections and aggregation, you can add computed
@@ -38,23 +38,22 @@ underpin the aggregation framework: :term:`pipelines <pipeline>` and
3838Pipelines
3939~~~~~~~~~
4040
41- A pipeline is process that applies a sequence of documents when using
42- the aggregation framework. For those familiar with UNIX-like shells
43- (e.g. bash,) the concept is analogous to the pipe (i.e. "`` | ``") used
44- to string operations together.
41+ Conceptually, documents from a collection are passed through an
42+ aggregation pipeline, and are transformed as they pass through it.
43+ For those familiar with UNIX-like shells (e.g. bash,) the concept is
44+ analogous to the pipe (i.e. "`` | ``") used to string text filters together.
4545
4646In a shell environment the pipe redirects a stream of characters from
4747the output of one process to the input of the next. The MongoDB
4848aggregation pipeline streams MongoDB documents from one :doc: `pipeline
4949operator </reference/aggregation>` to the next to process the
5050documents.
5151
52- All pipeline operators processes a stream of documents, and the
52+ All pipeline operators process a stream of documents, and the
5353pipeline behaves as if the operation scans a :term: `collection ` and
54- passes all matching documents into the "top" of the pipeline. Then,
55- each operator in the pipleine transforms each document as it passes
56- through the pipeline. At the end of the pipeline, the aggregation
57- framework returns documents in the same manner as all other queries.
54+ passes all matching documents into the "top" of the pipeline.
55+ Each operator in the pipleine transforms each document as it passes
56+ through the pipeline.
5857
5958.. note ::
6059
@@ -72,24 +71,26 @@ framework returns documents in the same manner as all other queries.
7271 - :agg:pipeline: `$unwind `
7372 - :agg:pipeline: `$group `
7473 - :agg:pipeline: `$sort `
74+ TODO I'd remove references to $out, since we don't have it yet
7575 - :agg:pipeline: `$out `
7676
7777.. _aggregation-expressions :
7878
7979Expressions
8080~~~~~~~~~~~
8181
82- Expressions calculate values based on inputs from the pipeline, and
83- return their results to the pipeline. The aggregation framework
84- defines expressions in :term: `JSON ` using a prefix format.
82+ Expressions calculate values based on documents passing through the pipeline,
83+ and contribute their results to documents flowing through the pipeline.
84+ The aggregation framework defines expressions in :term: `JSON ` using a prefix
85+ format.
8586
8687Often, expressions are stateless and are only evaluated when seen by
8788the aggregation process. Stateless expressions perform operations such
88- as: adding the values of two fields together, or extracting the year
89+ as adding the values of two fields together or extracting the year
8990from a date.
9091
9192The :term: `accumulator ` expressions *do * retain state, and the
92- :agg:pipeline: `$group ` operator uses maintains state (e.g. counts,
93+ :agg:pipeline: `$group ` operator maintains that state (e.g.
9394totals, maximums, minimums, and related data.) as documents progress
9495through the :term: `pipeline `.
9596
@@ -104,17 +105,17 @@ Invocation
104105~~~~~~~~~~
105106
106107Invoke an :term: `aggregation ` operation with the :func: `aggregate `
107- wrapper in the :program: `mongo ` shell for the :dbcommand: `aggregate `
108+ wrapper in the :program: `mongo ` shell or the :dbcommand: `aggregate `
108109:term: `database command `. Always call :func: `aggregate ` on a
109110collection object, which will determine the documents that contribute
110111to the beginning of the aggregation :term: `pipeline `. The arguments to
111- the :func: `aggregate ` function specify a sequence :ref: `pipeline
112+ the :func: `aggregate ` function specify a sequence of :ref: `pipeline
112113operators <aggregation-pipeline-operator-reference>`, where each
113114:ref: `pipeline operator <aggregation-pipeline-operator-reference >` may
114115have a number of operands.
115116
116117First, consider a :term: `collection ` of documents named "``article ``"
117- using the following schema or and format:
118+ using the following format:
118119
119120.. code-block :: javascript
120121
@@ -169,7 +170,10 @@ The aggregation operation in the previous section returns a
169170 if there was an error
170171
171172As a document, the result is subject to the current :ref: `BSON
172- Document size <limit-maximum-bson-document-size>`. If you expect the
173+ Document size <limit-maximum-bson-document-size>`.
174+
175+ TODO $out is not going to be available in 2.2, so I'd eliminate this reference
176+ If you expect the
173177aggregation framework to return a larger result, consider using the
174178use the :agg:pipeline: `$out ` pipeline operator to write the output to a
175179collection.
@@ -181,22 +185,21 @@ Early Filtering
181185~~~~~~~~~~~~~~~
182186
183187Because you will always call :func: `aggregate ` on a
184- :term: `collection ` object, which inserts the *entire * collection into
185- the aggregation pipeline, you may want to increase efficiency in some
186- situations by avoiding scanning an entire collection.
188+ :term: `collection ` object, which logically inserts the *entire * collection into
189+ the aggregation pipeline, you may want to optimize the operation
190+ by avoiding scanning the entire collection whenever possible .
187191
188192If your aggregation operation requires only a subset of the data in a
189- collection, use the :agg:pipeline: `$match ` to limit the items in the
190- pipeline, as in a query. These :agg:pipeline: `$match ` operations will use
191- suitable indexes to access the matching element or elements in a
192- collection.
193-
194- When :agg:pipeline: `$match ` appears first in the :term: `pipeline `, the
195- :dbcommand: `pipeline ` begins with results of a :term: `query ` rather than
196- the entire contents of a collection.
197-
193+ collection, use the :agg:pipeline: `$match ` to restrict which items go in
194+ to the top of the
195+ pipeline, as in a query. When placed early in a pipeline, these
196+ :agg:pipeline: `$match ` operations will use
197+ suitable indexes to scan only the matching documents in a collection.
198+
199+ TODO we don't do the following yet, but there's a ticket for it. Should we
200+ leave it out for now?
198201:term: `Aggregation ` operations have an optimization phase, before
199- execution, attempts to re-arrange the pipeline by moving
202+ execution, which attempts to re-arrange the pipeline by moving
200203:agg:pipeline: `$match ` operators towards the beginning to the greatest
201204extent possible. For example, if a :term: `pipeline ` begins with a
202205:agg:pipeline: `$project ` that renames fields, followed by a
@@ -221,7 +224,7 @@ must fit in memory.
221224
222225:agg:pipeline: `$group ` has similar characteristics: Before any
223226:agg:pipeline: `$group ` passes its output along the pipeline, it must
224- receive the entity of its input. For the case of :agg:pipeline: `$group `
227+ receive the entirety of its input. For the case of :agg:pipeline: `$group `
225228this frequently does not require as much memory as
226229:agg:pipeline: `$sort `, because it only needs to retain one record for
227230each unique key in the grouping specification.
@@ -236,14 +239,14 @@ Sharded Operation
236239
237240The aggregation framework is compatible with sharded collections.
238241
239- When the operating on a sharded collection, the aggregation pipeline
240- splits into two parts. The aggregation framework pushes all of the
242+ When operating on a sharded collection, the aggregation pipeline
243+ splits the pipeline into two parts. The aggregation framework pushes all of the
241244operators up to and including the first :agg:pipeline: `$group ` or
242- :agg:pipeline: `$sort ` to each shard using the results received from the
243- shards. [#match-sharding ]_ Then, a second pipeline on the
245+ :agg:pipeline: `$sort ` to each shard.
246+ [#match-sharding ]_ Then, a second pipeline on the
244247:program: `mongos ` runs. This pipeline consists of the first
245248:agg:pipeline: `$group ` or :agg:pipeline: `$sort ` and any remaining pipeline
246- operators
249+ operators; this is run on the results received from the shards.
247250
248251The :program: `mongos ` pipeline merges :agg:pipeline: `$sort ` operations
249252from the shards. The :agg:pipeline: `$group `, brings any “sub-totals”
0 commit comments