@@ -151,13 +151,14 @@ Partitioners change the read behavior of batch reads that use the {+connector-sh
151151dividing the data into partitions, you can run transformations in parallel.
152152
153153This section contains configuration information for the following
154- partitioners :
154+ partitioner :
155155
156156- :ref:`SamplePartitioner <conf-samplepartitioner>`
157157- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
158158- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
159159- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
160160- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
161+ - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
161162
162163.. note:: Batch Reads Only
163164
@@ -302,6 +303,54 @@ The ``SinglePartitionPartitioner`` configuration creates a single partition.
302303To use this configuration, set the ``partitioner`` configuration option to
303304``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
304305
306+ .. _conf-autobucketpartitioner:
307+
308+ ``AutoBucketPartitioner`` Configuration
309+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
310+
311+ The ``AutoBucketPartitioner`` configuration is similar to the
312+ :ref:`SamplePartitioner <conf-samplepartitioner>`
313+ configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
314+ aggregation stage to paginate the data. By using this configuration,
315+ you can partition the data across single or multiple fields, including nested fields.
316+
317+ To use this configuration, set the ``partitioner`` configuration option to
318+ ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
319+
320+ .. list-table::
321+ :header-rows: 1
322+ :widths: 35 65
323+
324+ * - Property name
325+ - Description
326+
327+ * - ``partitioner.options.partition.fieldList``
328+ - The list of fields to use for partitioning. The value can be either a single field
329+ name or a list of comma-separated fields.
330+
331+ **Default:** ``_id``
332+
333+ * - ``partitioner.options.partition.chunkSize``
334+ - The average size (MB) for each partition. Smaller partition sizes
335+ create more partitions containing fewer documents.
336+ Because this configuration uses the average document size to determine the number of
337+ documents per partition, partitions might not be the same size.
338+
339+ **Default:** ``64``
340+
341+ * - ``partitioner.options.partition.samplesPerPartition``
342+ - The number of samples to take per partition.
343+
344+ **Default:** ``100``
345+
346+ * - ``partitioner.options.partition.partitionKeyProjectionField``
347+ - The field name to use for a projected field that contains all the
348+ fields used to partition the collection.
349+ We recommend changing the value of this property only if each document already
350+ contains the ``__idx`` field.
351+
352+ **Default:** ``__idx``
353+
305354Specifying Properties in ``connection.uri``
306355-------------------------------------------
307356
0 commit comments