Delete documents by _id? fastest way?

Hi there,

I have a MongoDB collection that has around 92 MM of documents, I need to delete around 24 MM of those documents. I have the “_id” of the documents I want to delete.

I am looking the better/fastest way to delete those records. I could have a Java application that iterate for the 24 MM of “_id” and delete one by one, OR I can have a JSON file with the “_id” and do using Mongo CLI or usingmongoimport --mode=delete --file=data.json.

Any idea what could be fastest?

Thanks in advance.

1 Like

For the record: I end-up generating 6 text files of 4 MM of lines each, “a statement per line”. e.g: deleteOne (Object of primary key _id)…
The line contains the primaryKey of a record I want to delete. Then of +95 MM I want to delete around 24 MM records.

Loading the files like:

mongo mongodb://%2Ftmp%2Fmongodb-27017.sock/myCollectionName < /tmp/deletes1.json &

mongo mongodb://%2Ftmp%2Fmongodb-27017.sock/myCollectionName < /tmp/deletes6.json &

of course putting each command in background. Then loading the file 1 and 2, took like 70 minutes, then loading 2 other files at the same time (file 3 and 4), and finally loading the last 2 files (5 and 6). In total a bit over 3 hours and half. Which is was fastest than Java one delete at the time, then around 7 and half hours, of course Java is doing one record at the time without using threads.

On another day, I loaded file 1, 2 and 3 at the same time then when those 3 finished, I loaded file 4, 5 and 6, that took 2.5 hours, and on another day I loaded the 6 files at the same time “6 processes” it also took 2.5 hours.
The I think if I load the file 1 ,2 and 3 and once those are finish then I loaded the files 4, 5, 6 because that doesn’t stress the system as much as 6 files at the same time, and the outcome is the same.

Cheers,

1 Like

Thank you for sharing your results and solution.