@@ -415,10 +415,10 @@ directly.
415415Splitting Chunks
416416~~~~~~~~~~~~~~~~
417417
418- Normally, MongoDB splits a :term:`chunk` when a chunk exceeds the
419- :ref:`chunk size <sharding-chunk-size>`.
420- Recently split chunks may be moved immediately to a new shard
421- if :program:`mongos` predicts future insertions will benefit from the
418+ Normally, MongoDB splits a :term:`chunk` following inserts when a
419+ chunk exceeds the :ref:`chunk size <sharding-chunk-size>`. Recently
420+ split chunks may be moved immediately to a new shard if
421+ :program:`mongos` predicts future insertions will benefit from the
422422move.
423423
424424The MongoDB treats all chunks the same, whether split manually or
@@ -444,9 +444,9 @@ You may want to split chunks manually if:
444444 keys are between ``250`` and ``500`` are in a single chunk.
445445
446446To determine the current chunk ranges across the cluster, use
447- :func:`sh.status()` or :func:`db.printShardingStatus()`.
447+ :func:`sh.status()` or :func:`db.printShardingStatus()`.
448448
449- Split chunks in a collection using the :dbcommand:`split` command with
449+ To split chunks manually, use the :dbcommand:`split` command with
450450operators: ``middle`` and ``find``. The equivalent shell helpers are
451451:func:`sh.splitAt()` or :func:`sh.splitFind()`.
452452
@@ -459,12 +459,11 @@ operators: ``middle`` and ``find``. The equivalent shell helpers are
459459
460460 sh.splitFind( { "zipcode": 63109 } )
461461
462- :func:`sh.splitFind()` will split the chunk that contains the *first* document returned
463- that matches this query into two equal sized chunks.
464- The query in :func:`sh.splitFind()` may
465- not be based on the shard key, though it almost always makes sense to
466- query for the shard key in this case, and including the shard key will
467- expedite the operation.
462+ :func:`sh.splitFind()` will split the chunk containing the queried
463+ document into two equal sized chunks, dividing the chunk using the
464+ balancer algorithm. The :func:`sh.splitFind()` query may not be based
465+ on the shard key, but it makes sense to use the shard key, and
466+ including the shard key will expedite the operation.
468467
469468Use :func:`sh.splitAt()` to split a chunk in two using the queried
470469document as the partition point:
@@ -477,15 +476,40 @@ However, the location of the document that this query finds with
477476respect to the other documents in the chunk does not affect how the
478477chunk splits.
479478
480- Pre-splitting Chunks
481- ~~~~~~~~~~~~~~~~~~~~
479+ Pre-Split
480+ ~~~~~~~~~
481+
482+ Splitting chunks will improve shard cluster performance when importing
483+ data such as:
484+
485+ - migrating data from another system to a shard cluster
486+
487+ - performing a full shard cluster restore from back up
488+
489+ In such cases, data import to an empty MongoDB shard cluster can
490+ become slower than to a replica set. The reason for this is how
491+ MongoDB writes data and manage chunks in a shard cluster.
482492
483- For large imports, pre-splitting and pre-migrating many chunks
484- will dramatically improve performance because the system does not need
485- to split and migrate new chunks during import .
493+ MongoDB inserts data into a chunk unti it becomes large and splits on
494+ the same shard. If the balancer notices a chunk imbalance between
495+ shards, a migration process will begin to evenly distribute chunks .
486496
487- #. Make many chunks by splitting empty chunks in your
488- collection.
497+ Migrating chunks between shards is extremely resource intensive as
498+ shard members must notify, copy, update, and delete chunks between
499+ each other and the configuration database. With a high volume import,
500+ this migration process can reduce system performance so that imports
501+ cannot occur because of migrations.
502+
503+ To improve import performance, manually split and migrate chunks in an
504+ empty shard cluster beforehand. This allows the shard cluster to only
505+ write import data instead of trying to manage migrations and write
506+ data.
507+
508+ To prepare your shard cluster for data import, split and migrate
509+ empty chunks.
510+
511+ #. Split empty chunks in your collection by manually performing
512+ :dbcommand:`split` command on chunks.
489513
490514 .. example::
491515
@@ -498,14 +522,34 @@ to split and migrate new chunks during import.
498522 for ( var x=97; x<97+26; x++ ){
499523 for( var y=97; y<97+26; y+=6 ) {
500524 var prefix = String.fromCharCode(x) + String.fromCharCode(y);
501- db.runCommand( { split : <collection> , middle : { email : prefix } } );
525+ db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );
502526 }
503527 }
504528
505- #. Move chunks to different shard by using the balancer or manually
506- moving chunks.
529+ #. Move chunks to different shard manually using the
530+ :dbcommand:`moveChunk` command.
531+
532+ .. example::
533+
534+ To migrate all the chunks created for 100 million user profiles
535+ evenly, putting each prefix chunk on the next shard from the
536+ other , run the following commands in the mongo shell:
537+
538+ .. code-block:: javascript
539+
540+ var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ];
541+ for ( var x=97; x<97+26; x++ ){
542+ for( var y=97; y<97+26; y+=6 ) {
543+ var prefix = String.fromCharCode(x) + String.fromCharCode(y);
544+ db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]})
545+ }
546+ }
547+
548+ Optionally, you can also let the balancer automatically
549+ redistribute chunks in your shard cluster.
507550
508- #. Insert data into the shard cluster using a custom script for your data.
551+ When empty chunks are distributed, data import can occur with multiple
552+ :program:`mongos` instances, improving overall import performance.
509553
510554.. _sharding-balancing-modify-chunk-size:
511555
0 commit comments