Decrease collection size, fastest way?

jbiancot · February 2, 2021, 2:00am

Hi there,

We have a Percona MongoDB v3.6.17-4.0, runnion on Debian. One of our backends has been adding too much data, then size is 100GB, growing about 5GB every 2-3 weeks. We realized we have a bunch of documents we could delete, of the 85 MM, perhaps 1/4 of the documents can be gone, for good.

I am doing a few steps:

Add a compound index to use on the deleteMany.
Run the 1st deleteMany that use the compound index of 2 fields.
Run another deleteMany that use a current/existing index.
Compact the collection.

It takes a long time, like 20 hours in our ESXi servers. I was wondering how I can improve it? We have add more RAM to the VM, 64GB of RAM, and I am assigning 48GB to cacheSizeGB, we use the wiredTiger engine, with the journal set to false, and checkpointSizeMB set to 500.

The original database runs on a ReplicaSet, but I have setup this node as standalone, I guess I might have better results single node, for this particular task.

I was wondering, what else, I can do? add more RAM?

The collection I am refering has already many index, like 45-50 indexes, then another tests I am thinking is:

drop all the indexes
add the 2 or 3 index I need for this specific task
Run the 1st deleteMany that use the compound index of 2 fields.
Run another deleteMany that use a current
Compact the collection.
Add all the rest of the indexes we have, and our backends utilizes.

Help me, Obi Wan Kenobi, you’re my only hope.

Akira_Kurogane · February 2, 2021, 7:25am

Hi.

The only low-cost data deletion is when an entire collection is dropped in one go.

Deletion of individual documents is approximately as much work as an update. Eg. deleting 1 million documents will take approximately as much time and consume as much hardware resources as updating 1 million documents.

For each deletion:

The document’s position in the collection data is looked up
The entire document is read
For every index on that collection: take the index’s field values from the document and remove the index entry for that doc
Release the document’s data address+span and add it to a freelist.

The batchwrite method (deleteMany) does reduce the amount of network requests being done but doesn’t batch the work above in any way by my understanding.

Adding an index to find the documents to delete improves the efficiency of the first step, but only the first step. There’s still a full read and other index deletions to be performed plus the random-access write to mark the document’s space as being free.

Having everything in memory will make it fast, faster at least until the flushing to disk during checkpoints saturates that capacity.

In case you’re wondering: Using TTL indexes won’t make it more effective. The deletes will be launched automatically but it is still the same work.

Topic		Replies	Views
Capped delete optimizer Percona Server for MongoDB	0	611	October 27, 2015
Improving performance Percona Mongo v3.6.17-4.0 standalone Percona Server for MongoDB mongodb	1	710	December 16, 2020
Delete documents by _id? fastest way? MongoDB® General Discussion mongodb	3	2892	February 16, 2022
Percona logical backup slower than mongodump Percona Backup for MongoDB	9	148	August 19, 2024
PBM backup speed - mongodb Percona Backup for MongoDB community , mongodb	6	1562	May 12, 2021

Decrease collection size, fastest way?

Related topics