@@ -124,7 +124,10 @@ You can configure the following properties to read from MongoDB:
124124 **Default:** ``10000``
125125
126126 * - ``partitioner``
127- - The name of the partitioner to use to partition the data.
127+ - The name of the partitioner to use to split collection data into
128+ partitions. Partitions are based on a range of values of a field
129+ (e.g. ``_id``\s 1 to 100).
130+
128131 The connector provides the following partitioners:
129132
130133 - ``MongoDefaultPartitioner``
@@ -135,8 +138,8 @@ You can configure the following properties to read from MongoDB:
135138 **Requires MongoDB 3.2**. A general purpose partitioner for
136139 all deployments. Uses the average document size and random
137140 sampling of the collection to determine suitable
138- partitions for the collection. For configuration settings
139- for the MongoSamplePartitioner, see
141+ partitions for the collection. For configuration
142+ settings for the MongoSamplePartitioner, see
140143 :ref:`conf-mongosamplepartitioner`.
141144
142145 - ``MongoShardedPartitioner``
@@ -249,15 +252,41 @@ Partitioner Configuration
249252 **Default:** ``_id``
250253
251254 * - ``partitionSizeMB``
252- - The size (in MB) for each partition
255+ - The size (in MB) for each partition. Smaller partition sizes
256+ create more partitions containing fewer documents.
253257
254258 **Default:** ``64``
255259
256260 * - ``samplesPerPartition``
257- - The number of sample documents to take for each partition.
261+ - The number of sample documents to take for each partition in
262+ order to establish a ``partitionKey`` range for each partition.
263+
264+ A greater number of ``samplesPerPartition`` helps to find
265+ ``partitionKey`` ranges that more closely match the
266+ ``partitionSizeMB`` you specify.
267+
268+ .. note::
269+
270+ For sampling to improve performance, ``samplesPerPartition``
271+ must be fewer than the number of documents within each of
272+ your partitions.
273+
274+ You can estimate the number of documents within each of your
275+ partitions by dividing your ``partitionSizeMB`` by the
276+ average document size (in MB) in your collection.
258277
259278 **Default:** ``10``
260279
280+ .. example::
281+
282+ For a collection with 640 documents with an average document
283+ size of 0.5 MB, the default ``MongoSamplePartitioner`` configuration
284+ values creates 5 partitions with 128 documents per partition.
285+
286+ The MongoDB Spark Connector samples 50 documents (the default 10
287+ per intended partition) and defines 5 partitions by selecting
288+ ``partitionKey`` ranges from the sampled documents.
289+
261290.. _conf-mongoshardedpartitioner:
262291
263292``MongoShardedPartitioner`` Configuration
@@ -303,7 +332,8 @@ Partitioner Configuration
303332 **Default:** ``_id``
304333
305334 * - ``partitionSizeMB``
306- - The size (in MB) for each partition
335+ - The size (in MB) for each partition. Smaller partition sizes
336+ create more partitions containing fewer documents.
307337
308338 **Default:** ``64``
309339
@@ -328,7 +358,8 @@ Partitioner Configuration
328358 **Default:** ``_id``
329359
330360 * - ``numberOfPartitions``
331- - The number of partitions to create.
361+ - The number of partitions to create. A greater number of
362+ partitions means fewer documents per partition.
332363
333364 **Default:** ``64``
334365
@@ -353,7 +384,8 @@ Partitioner Configuration
353384 **Default:** ``_id``
354385
355386 * - ``partitionSizeMB``
356- - The size (in MB) for each partition
387+ - The size (in MB) for each partition. Smaller partition sizes
388+ create more partitions containing fewer documents.
357389
358390 **Default:** ``64``
359391
0 commit comments