When we checked mongo logs, we noticed that actual map & reduce functions run with same speed both the times, but the difference is due to “M/R Reduce Post Processing”.
In the first case, it finishes in a jiffy & we see following lines in the log…
So post processing done in 4-5 secs.
So more than 10mins for ~50k records.
In case of deletes, TokuMX inserts a delete message into a buffer in the fractal tree but the actual entry containing the data could still be present in the leaf node. So our guess is that actual data entries are being deleted one by one during Reduce Post Processing, since new entries with same IDs need to be added.
Yes, we have observed the mapReduce performance issue in both TokuMX as well as PSMDB using the PerconaFT engine. Your observations regarding the delete messages are also correct. Node deletes are the Achilles Heel of the fractal tree. Sadly, now that PerconaFT has been deprecated for PSMDB and Aggregation Pipeline is now preferred over mapReduce in most scenarios, it is not on our road map to address this issue at the present time.
If you require mapReduce functionality, I recommend migrating to Percona Server for MongoDB with the rocksdb or wiredTiger storage engines.
Appreciate your response & I understand this.
Aggregation would not suffice for our needs & we cannot avoid using map-reduce. We would consider migrating to PSMDB with rocksdb or wiredTiger.