@@ -104,7 +104,7 @@ created subsequently, may reside on any shard in the cluster.
104104Failover scenarios within MongoDB
105105---------------------------------
106106
107- A properly deployed MongoDB shard cluster will not have a single point
107+ A properly deployed MongoDB shard cluster will have no single point
108108of failure. This section describes potential points of failure within
109109a shard cluster and its recovery method.
110110
@@ -116,42 +116,50 @@ For reference, a properly deployed MongoDB shard cluster consists of:
116116
117117 - :program:`mongos` running on each application server.
118118
119- Scenarios :
119+ Potential failure scenarios :
120120
121121- A :term:`mongos` or the application server failing.
122122
123123 As each application server is running its own :program:`mongos`
124124 instance, the database is still accessible for other application
125- servers. :program:`mongos` is stateless, so if it fails, no critical
126- information is lost. When :program:`mongos` restarts, it will retrieve a copy
127- of the configuration from the :term:`config database` and resume
128- working.
125+ servers and the data is intact . :program:`mongos` is stateless, so
126+ if it fails, no critical information is lost. When :program:`mongos`
127+ restarts, it will retrieve a copy of the configuration from the
128+ :term:`config database` and resume working.
129129
130130 Suggested user intervention: restart application servers and/or
131131 :program:`mongos`.
132132
133133- A single :term:`mongod` suffers a failure in a shard.
134134
135- A single :term:`mongod` instance failing will be recovered by a
136- :term:`secondary` member of the shard replica set. As each shard
137- will have a single :term:`primary` and two :term:`secondary` members
138- with the exact same copy of the information, any member will be able
139- to replace the failed member.
135+ A single :term:`mongod` instance failing within a shard will be
136+ recovered by a :term:`secondary` member of the :term:` replica
137+ set`. As each shard will have two :term:`secondary` members with the
138+ exact same copy of the information, :term:`secondary` members will
139+ be able to replace the failed :term:`primary` member.
140140
141- Suggested course of action: investigate failure and replace member
142- as soon as possible. Additional loss of members on same shard will
143- reduce availablility.
141+ Suggested course of action: investigate failure and replace
142+ :term:`primary` member as soon as possible. Additional loss of
143+ members on same shard will reduce availablility and the shard
144+ cluster's data set reliability.
144145
145146- All three replica set members of a shard fail.
146147
147148 All data within that shard will be unavailable, but the shard
148- cluster will still be operational for applications. Data on other
149- shards will be accessible and new data can be written to other shard
150- members.
149+ cluster's other data will still be operational for applications and
150+ new data can be written to other shard members.
151151
152- - A :term:`config database` suffers a failure.
152+ Suggested course of action: investigate situation immediately.
153+
154+ - A :term:`config database` server suffers a failure.
153155
154156 As the :term:`config database` is deployed in a 3 member
155157 configuration with two-phase commits to maintain synchronization
156- between all members. Any single member failing will not result in a
157- loss of operation
158+ between all members. Shard cluster operation will continue as normal
159+ but :ref:`chunk migration` will not occur.
160+
161+ Suggested course of action: replace :term:`config database` server
162+ as soon as possible. Shards will become unbalanced without chunk
163+ migration capability. Additional loss of :term:`config database`
164+ servers will put the shard cluster metadata in jeopardy.
165+
0 commit comments