Skip to content

Commit 06dcd61

Browse files
author
Andrew Leung
committed
rewriting pre-split section and cleaning up other parts
1 parent f11feef commit 06dcd61

File tree

1 file changed

+67
-23
lines changed

1 file changed

+67
-23
lines changed

source/administration/sharding.txt

Lines changed: 67 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -415,10 +415,10 @@ directly.
415415
Splitting Chunks
416416
~~~~~~~~~~~~~~~~
417417

418-
Normally, MongoDB splits a :term:`chunk` when a chunk exceeds the
419-
:ref:`chunk size <sharding-chunk-size>`.
420-
Recently split chunks may be moved immediately to a new shard
421-
if :program:`mongos` predicts future insertions will benefit from the
418+
Normally, MongoDB splits a :term:`chunk` following inserts when a
419+
chunk exceeds the :ref:`chunk size <sharding-chunk-size>`. Recently
420+
split chunks may be moved immediately to a new shard if
421+
:program:`mongos` predicts future insertions will benefit from the
422422
move.
423423

424424
The MongoDB treats all chunks the same, whether split manually or
@@ -444,9 +444,9 @@ You may want to split chunks manually if:
444444
keys are between ``250`` and ``500`` are in a single chunk.
445445

446446
To determine the current chunk ranges across the cluster, use
447-
:func:`sh.status()` or :func:`db.printShardingStatus()`.
447+
:func:`sh.status()` or :func:`db.printShardingStatus()`.
448448

449-
Split chunks in a collection using the :dbcommand:`split` command with
449+
To split chunks manually, use the :dbcommand:`split` command with
450450
operators: ``middle`` and ``find``. The equivalent shell helpers are
451451
:func:`sh.splitAt()` or :func:`sh.splitFind()`.
452452

@@ -459,12 +459,11 @@ operators: ``middle`` and ``find``. The equivalent shell helpers are
459459

460460
sh.splitFind( { "zipcode": 63109 } )
461461

462-
:func:`sh.splitFind()` will split the chunk that contains the *first* document returned
463-
that matches this query into two equal sized chunks.
464-
The query in :func:`sh.splitFind()` may
465-
not be based on the shard key, though it almost always makes sense to
466-
query for the shard key in this case, and including the shard key will
467-
expedite the operation.
462+
:func:`sh.splitFind()` will split the chunk containing the queried
463+
document into two equal sized chunks, dividing the chunk using the
464+
balancer algorithm. The :func:`sh.splitFind()` query may not be based
465+
on the shard key, but it makes sense to use the shard key, and
466+
including the shard key will expedite the operation.
468467

469468
Use :func:`sh.splitAt()` to split a chunk in two using the queried
470469
document as the partition point:
@@ -477,15 +476,40 @@ However, the location of the document that this query finds with
477476
respect to the other documents in the chunk does not affect how the
478477
chunk splits.
479478

480-
Pre-splitting Chunks
481-
~~~~~~~~~~~~~~~~~~~~
479+
Pre-Split
480+
~~~~~~~~~
481+
482+
Splitting chunks will improve shard cluster performance when importing
483+
data such as:
484+
485+
- migrating data from another system to a shard cluster
486+
487+
- performing a full shard cluster restore from back up
488+
489+
In such cases, data import to an empty MongoDB shard cluster can
490+
become slower than to a replica set. The reason for this is how
491+
MongoDB writes data and manage chunks in a shard cluster.
482492

483-
For large imports, pre-splitting and pre-migrating many chunks
484-
will dramatically improve performance because the system does not need
485-
to split and migrate new chunks during import.
493+
MongoDB inserts data into a chunk unti it becomes large and splits on
494+
the same shard. If the balancer notices a chunk imbalance between
495+
shards, a migration process will begin to evenly distribute chunks.
486496

487-
#. Make many chunks by splitting empty chunks in your
488-
collection.
497+
Migrating chunks between shards is extremely resource intensive as
498+
shard members must notify, copy, update, and delete chunks between
499+
each other and the configuration database. With a high volume import,
500+
this migration process can reduce system performance so that imports
501+
cannot occur because of migrations.
502+
503+
To improve import performance, manually split and migrate chunks in an
504+
empty shard cluster beforehand. This allows the shard cluster to only
505+
write import data instead of trying to manage migrations and write
506+
data.
507+
508+
To prepare your shard cluster for data import, split and migrate
509+
empty chunks.
510+
511+
#. Split empty chunks in your collection by manually performing
512+
:dbcommand:`split` command on chunks.
489513

490514
.. example::
491515

@@ -498,14 +522,34 @@ to split and migrate new chunks during import.
498522
for ( var x=97; x<97+26; x++ ){
499523
for( var y=97; y<97+26; y+=6 ) {
500524
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
501-
db.runCommand( { split : <collection> , middle : { email : prefix } } );
525+
db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );
502526
}
503527
}
504528

505-
#. Move chunks to different shard by using the balancer or manually
506-
moving chunks.
529+
#. Move chunks to different shard manually using the
530+
:dbcommand:`moveChunk` command.
531+
532+
.. example::
533+
534+
To migrate all the chunks created for 100 million user profiles
535+
evenly, putting each prefix chunk on the next shard from the
536+
other , run the following commands in the mongo shell:
537+
538+
.. code-block:: javascript
539+
540+
var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ];
541+
for ( var x=97; x<97+26; x++ ){
542+
for( var y=97; y<97+26; y+=6 ) {
543+
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
544+
db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]})
545+
}
546+
}
547+
548+
Optionally, you can also let the balancer automatically
549+
redistribute chunks in your shard cluster.
507550

508-
#. Insert data into the shard cluster using a custom script for your data.
551+
When empty chunks are distributed, data import can occur with multiple
552+
:program:`mongos` instances, improving overall import performance.
509553

510554
.. _sharding-balancing-modify-chunk-size:
511555

0 commit comments

Comments
 (0)