In mongodb, is "sh.getBalancerState() is false" the same as "sh.isBanlancerRunning is false"?

All we need is an easy explanation of the problem, so here it is.

During upgrading config servers to use WiredTIger, I stopped balancer using sh.setBalancerState(false), and then I run sh.getBalancerState(). The output is false. Does this mean the balancer is not running? After this, I just started to upgrade the config servers to use WiredTiger. But after reading the document carefully, I am not sure whether no migrations are in progress after running sh.setBalancerState(false)? If some migrations are running, but I backup the config data and stopped the config servers one by one. What is the bad effect? Now the config servers are all up with WiredTiger, how to check whether the config servers have the same data, especially config data, meta data….?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You can verify that no migrations are running by checking the balance with

 sh.isBalancerRunning()

which is true if chunks are being migrated and false if not. Using BalancerState only shows you if it is enabled or disabled, not its current run state. While it depends on what the specific documentation says, I’d probably feel safer setting the balancer state to false, checking the migration status with the above command, and then stopping it:

 sh.stopBalancer()

So now that we have the proper method clarified,

What is the bad effect?

I’m not too sure how gracefully MongoDB would potentially handle this issue. However, you should be able to find the steps that occur during a migration in your log.

Always check the logs!

Update: As specified by the OP, you can also use sh.status() if this work occurred in the last 24 hours to check if there are any recorded errors in migrations from the balancer. If > 24 hours, go check the logs.

Update 2: Marcus clarified in the comments that partial migrations are not possible, so this should not be a concern.

Method 2

Nothing bad happened even if you took the config server offline during a chunk migration. In order for a chunk to be marked as migrated, all three config servers need to be up (Contrary to popular belief they do not form a replica set.).

(The following is a tiny bit simplified for the sake of shortness.)
When a chunk is moved, a global (read cluster wide) lock is written to the config servers. Then, the chunk gets copied over to the target machine. Next, the metadata gets updated so that every mongos asking for data contained in that chunk is pointed to the new server. Only after a successful metadata update, the chunk gets deleted on the old machine. So if anything goes wrong during the chunk migration, the metadata will always point to the correct location, while the lock prevents any data to be written to the chunk. I am not sure when the lock is lifted, but iirc it is done after the metadata update.

So – since all three config servers need to be online for metadata updates – as soon as you put a config server offline, you prevent any chunk migrations from taking effect.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply