When we checked mongo logs, we noticed that actual map & reduce functions run with same speed both the times, but the difference is due to “M/R Reduce Post Processing”.
In the first case, it finishes in a jiffy & we see following lines in the log…
So post processing done in 4-5 secs.
So more than 10mins for ~50k records.
In case of deletes, TokuMX inserts a delete message into a buffer in the fractal tree but the actual entry containing the data could still be present in the leaf node. So our guess is that actual data entries are being deleted one by one during Reduce Post Processing, since new entries with same IDs need to be added.