Better way to do backup

Good afternoon, I have 3 instances of mongo db (replica set) for storing logs and I have a pbm that makes backups to NFS Share and because of large volumes of data i’m delete some of the data from mongodb ( delete some logs) for saving disk space but
I need backups to be available for 1 year (if regulators say they need logs 12 months ago, I have to restore it) and to save disk space on NFS I want to use incremental backups
how to make them according to best practices? take a full backup once at the beginning, then set up PITR for every 12 hours, or do a full backup every day at 00:00 and then make incremental backups? in general, I need to store backups for 1 year and that would be able to recover a year ago

1 Like

and what will happen if I put recovery from 03.03.2021 as an example? Will my current data be deleted? or just added and not new data will be touched?

1 Like

Hi, you need to evaluate different aspects. How large is a full backup, how large is the oplog generated in 1 day, how many available space you have in your NFS share.

If you have a databases that is writing a lot (in particular many updates) it could happen the size of teh oplog in 1 day is larger than the entire databases, so you cannot solve your problem with PITR. Anyway I think this is not your case. You’re talking about logs, you should have only a bunch if inserts, not deletes or updates.

PITR can be a suitable solution in your case. It works starting from a full backup and the streaming the oplog slices into the remote storage. There’s not a generally valid suggestion: it’s ok if you do 1 full backup per day, or 1 per week. and so on. You just need to calculate how many space you need and how many space you have.

For example, let’s supposed you DB is 100GB in size and the oplog events are 5GB/day.
In case you do 1 full per day then you need 105GB per day multiplied by 365 for 1 year retention. Makes sense? If you don’t have enough space, just take a full dump once per week instead.
By the way, you should also consider the growth rate of the db size, obviously.

Also consider the recovery. If you do more frequente full backups the recovery time is faster becuase the events to replat from the oplog slices are not taht much. But if you have very unfrequent full dumps, then the recovery time could be very large depending on the size of the oplog to replay.

If you are in the few GBs magnitude for 1 day of oplog, I’d take a full dump once per day.

1 Like

When you recover the database to a specific point in time, the existing data is lost.

1 Like