TokuMX: Wrong collection size & document count

Per official documentation… “TokuMX does not require routine maintenance to deal with fragmentation, as performance remains steady through heavy usage. Data does not ever need to be compacted, repaired, or re-indexed to restore performance.”

However, I am observing that size of one of our collections is becoming much too big & reaching unjustifiable levels (40+ GB when it should be <10 GB). Collection stats are showing wrong counts (60m+ when it should be <10m) too. This is specifically for a collection where we frequently delete old documents.

See below…

[QUOTE]

After looking at this, I am beginning to suspect that space for deleted documents is not being reclaimed & count is not adjusted either.
So my questions is, if we need to compact the collection or repair the database? If yes, does TokuMX support these commands?

Hi Niraj,

Below I show two possible questions could come up on this topic: Q1: Is it required to explicitly call the db.collection.reIndex() command (db.collection.reIndex() — MongoDB Manual) after deleting significant amounts of the documents? Or will TokuMX re-use and/or give back the disk space after some time?

A1: Fractal trees are different than binary trees because they use message buffers for updates which allow them to be fast (Fractal Trees, what are they?, for reference TokuMX Fractal Tree(R) indexes, what are they? - Percona Database Performance Blog). In the case of deletes, a delete message is inserted into a buffer in the fractal tree but the actual entry containing the data may still be in a leaf node. The cleaner thread will flush these messages down to the leaves but only if the node is in memory. The algorithm to flush those messages is weighted by how frequently they’re accessed and how large the messages are. Delete messages are small and may not have a high priority.

Therefore, cleanup is not always guaranteed and an explicit reIndex() should be done (4. Collection and Index Options). A reIndex() operation may not always result in a smaller file due to several factors including when checkpoints have been made and where the end of file marker is. It’s worth reindexing to improve performance in most cases.

Q2: If running reIndex() is recommended, what is the recommendation for replica sets? It seems that this command is not replicated, but has to be invoked explicitly on all replica members. Can/should all nodes run reIndex() at the same time?

A2: That’s correct, a reIndex() on the primary will only operate on the primary, it will not propagate to the secondary.
One strategy would be to perform a reIndex() on the secondary(ies), then stepDown() the primary and reindex on that host once it becomes a secondary. This will work if all the workload is directed to the primary, if you direct reads to the secondaries, you’ll want to account for the workload and likely reindex in the background, a reIndex will rebuild indexes in thebackground if the index was originally specified with this option. However, db.collection.reIndex() will rebuild the _id index in the foreground, which takes the database’s write lock.

For further information about a description of the fractal tree and what’s happening on deletes can be found in this blog:
[COLOR=#1155CC]https://www.percona.com/blog/2015/02…-improvements/

This was written for TokuDB but the way messages are written for TokuMX is the same.

Thank you, Luis. This is great information. Let me try out reIndex & see how it goes.

You are welcome.

Excellent!