Below I show two possible questions could come up on this topic: Q1: Is it required to explicitly call the db.collection.reIndex() command (http://docs.mongodb.org/v2.4/reference/method/db.collection.reIndex/) after deleting significant amounts of the documents? Or will TokuMX re-use and/or give back the disk space after some time?
A1: Fractal trees are different than binary trees because they use message buffers for updates which allow them to be fast (Fractal Trees, what are they?, for reference https://www.percona.com/blog/2013/07/02/tokumx-fractal-treer-indexes-what-are-they/). In the case of deletes, a delete message is inserted into a buffer in the fractal tree but the actual entry containing the data may still be in a leaf node. The cleaner thread will flush these messages down to the leaves but only if the node is in memory. The algorithm to flush those messages is weighted by how frequently they’re accessed and how large the messages are. Delete messages are small and may not have a high priority.
Therefore, cleanup is not always guaranteed and an explicit reIndex() should be done (https://www.percona.com/doc/percona-tokumx/collection_index_options.html). A reIndex() operation may not always result in a smaller file due to several factors including when checkpoints have been made and where the end of file marker is. It’s worth reindexing to improve performance in most cases.
Q2: If running reIndex() is recommended, what is the recommendation for replica sets? It seems that this command is not replicated, but has to be invoked explicitly on all replica members. Can/should all nodes run reIndex() at the same time?
A2: That’s correct, a reIndex() on the primary will only operate on the primary, it will not propagate to the secondary.
One strategy would be to perform a reIndex() on the secondary(ies), then stepDown() the primary and reindex on that host once it becomes a secondary. This will work if all the workload is directed to the primary, if you direct reads to the secondaries, you’ll want to account for the workload and likely reindex in the background, a reIndex will rebuild indexes in thebackground if the index was originally specified with this option. However, db.collection.reIndex() will rebuild the _id index in the foreground, which takes the database’s write lock.
For further information about a description of the fractal tree and what’s happening on deletes can be found in this blog:
This was written for TokuDB but the way messages are written for TokuMX is the same.