We tested pbm backup for sharded cluster and our cluster will grow to 1TB. When we tested with 50GB of sharded data the time taken for the full backup is around 1GB/minute, so if we calculate 1TB its huge can we set parallelism or is there any other way to optimize the backup. Do you suggest allocating more cpu,memory would help here. As of now we are backing to AWS S3.
We understand the mongo documents is already compressed on db so we are not getting good compression ratio with default s2 do you suggest we will get better compression with other algorithm ?
1 Like
Hi @ilanchezhianm
You can test out different compression options with Troubleshooting Percona Backup for MongoDB — Percona Backup for MongoDB 1.8 Documentation. s2
the fastest one, but pgzip
should have a better compression ratio. Maybe it would work better with your load.
Although there is no way to set parallelism or turn other backup performance-related knobs manually for now. But trowing more cpu resources might help. PBM adjusts concurrency for the backup processes based on available cpu cores. Plus more cpu power should help compression.
1 Like
Thank you @Andrew_Pogrebnoi for the reply we did test with gzip found the backup took extraordinarily longer (atleast 3x slower than s2), we are convinced S2 is optimal but 37 minutes for an 59GB database seem to be longer even with s2 compression.
Any suggestion on this would be appreciated. We see currently a thread is spawned per shard it would be nice to see if the replicaset with in that shard is utilized as well so it can go with parallel documents (just a thought).
1 Like
It is a good idea to utilize all (secondary) nodes for the backup. But there are quite a few challenges to implement this. And I’m not sure if we’ll approach this any time soon.
1 Like
Hi @ilanchezhianm
Yes, the gzip is a good deal slower. That’s only in there for, well, legacy expectations. We found that other algorithms were almost ~x10 as fast, single-threaded. Parallelized compression makes it faster if you have the cores to use (the default s2 is parallelized snappy).
Regarding your originally test the compressed size in the backup storage with the default s2 should be approximately the same as the live mongod node’s data directory size, because wiredtiger also uses the snappy compression library. So if it’s 50GB on disk it should be ~50GB in backup storage too.
This also means that transferring 50GB was slow in my opinion too. Even with a 100MB/s bandwidth (let’s say you have a 1GB ethernet as the tightest bottleneck in the network) that should be more like 10m, not 37m.
But now that I think about it I have observed speeds well below 100MB/s when uploading to AWS S3 from my home. I forget for now the speeds possible when doing it from a EC2 server with AWS to S3.
Could it be that the network bandwidth through to / accepted by AWS S3 from where you’re doing the test is the bottleneck?
In the end though backing up 1 TB of data (compressed) will take a long time unless you can achieve, say, 1GB/s or at least 0.5GB/s a second.
1 Like
Thank you @Akira_Kurogane and @Andrew_Pogrebnoi for the inputs.
Akira - Appreciate the detailed explanation and analysis. We are testing against decent server with 4vcpu, 16gb ram to AWS S3 with direct connect enabled so as you explained was expecting the same.
We are moving on as we it may not quite fit well for the large dataset. Thinking of a different way to stream the data out of mongo to longterm storage so we avoid this backup bottleneck.
Thanks again for the input.
1 Like
For speed nothing is faster than a hot backup, which Percona Server for MongoDB has inbuilt. That provides snapshot restore (not arbitrary point in time restore). With the development of $backupCursor and $backupCursorExtend in PSMDB-802, which is expected to be released as an experimental feature in 4.4.6, it will also be possible to have consistent hot backup for clusters.
In general once backup size approaches 1TB I would suggest hot backup method instead of ‘logical data format’ one such as PBM uses.
Also I assumed when PBM was being designed that if there were 1TB sizes then there would be plenty (say 16 or more) CPUs on each server. So there might be a CPU bottleneck in your case. 4 vCPUs are only two real CPU cores after all.