Guidance Needed — 8TB MongoDB Atlas to PSMDB Migration Using PCSM v0.7.0

I am planning an 8TB migration from MongoDB Atlas (3 node replica set) to a 3-node Percona Server for MongoDB replica set using PCSM v0.7.0. Looking for guidance from anyone who has used PCSM at multi-TB scale, or from the Percona team.

Environment:

  • Source: MongoDB Atlas Replica Set, ~8TB
  • Target: 3-node PSMDB Replica Set (same major version)
  • PCSM host: Dedicated EC2 node, same VPC as target
  • Oplog churn: ~6 GB/hr average
  • Oplog size: Increasing from 325 GB to 650 GB (~96 hour replication window based on observed 48hr at 325 GB)
  • Cannot set minimum oplog retention on Atlas (requires disk autoscaling, disabled by org policy)
  • Post-migration: Plan to shard the PSMDB cluster

Key concerns:

  1. Scale confidence — Has PCSM been used for multi-TB migrations? The largest tests I found in the codebase are ~2GB collections. I’d like to know if there are known scale limits or internal benchmarks.
  2. Oplog window risk — Without a minimum retention guarantee on Atlas, the 96-hour window is an estimate. Does PCSM monitor or warn when the source oplog window is
    shrinking during clone? If ChangeStreamHistoryLost occurs, is the only recovery path pcsm reset and full restart?
  3. Retry limits — I noticed DefaultMaxRetries=3 with exponential backoff totalling ~35 seconds. For a multi-day migration, transient network issues or Atlas maintenance could exceed this. Can these be made configurable?
  4. No mid-collection resume — If a large collection (hundreds of GB) fails partway through clone, it restarts from the beginning. Is segment-level checkpointing on the roadmap?
  5. Operation timeout — The default 5-minute timeout (PCSM_MONGODB_CLI_OPERATION_TIMEOUT) applies to all operations including index creation on large collections. Is it safe to set this to 60m?
  6. Undocumented parameters — I found these env vars in the codebase that aren’t on the
    Percona ClusterSync for MongoDB startup configuration - Percona ClusterSync for MongoDB . Which are stable for production use?
    • PCSM_REPL_NUM_WORKERS
    • PCSM_REPL_CHANGE_STREAM_BATCH_SIZE
    • PCSM_REPL_EVENT_QUEUE_SIZE
    • PCSM_REPL_WORKER_QUEUE_SIZE
    • PCSM_REPL_BULK_OPS_SIZE
    • PCSM_CLONE_SEGMENT_SIZE
    • PCSM_CLONE_READ_BATCH_SIZE
    • PCSM_DEV_TARGET_CLIENT_COMPRESSORS
    • PCSM_MONGODB_OPERATION_TIMEOUT
  7. Connection pool and compression — PCSM strips maxPoolSize and related options from connection strings. With high parallelism, could this bottleneck? Also, source-side compression isn’t available (only target via PCSM_DEV_TARGET_CLIENT_COMPRESSORS) — any plans to add it?

Proposed implementation plan:

  1. Increase Atlas oplog to 650 GB, wait 4-5 days
  2. Run full PCSM test against staging cluster first ( though very small 4GB so not the best indicator for performance)
  3. Execute production migration during lowest write activity with active oplog monitoring

Any recommended PCSM configuration values for this scale would be greatly appreciated.

Thanks in advance.

Hello @Daniel_Osuntoyinbo ,

Thanks for reaching out and considering PCSM, here are some answers for your concerns:

  1. We run perf tests internally and with random 200KB documents (where doc are almost not compressible, so db logical size +/- equals physical) we cloned 1000GB in 53.4 minutes. Hardware where PCSM was running was i3en.3xlarge 12 CPU, 96 GB RAM. Of course, perf will highly depend on your infra.
  2. No, PCSM does not monitor or warn about ChangeStreamHistoryLost, when it starts listening for changes after clone is finished, the replication will fail if the error occurs.
  3. Currently user can’t configure this option, but we could potentially have it in our next 0.8.0 release (planned for next month).
  4. Currently there is no recovery on clone failure, the only thing possible is the restart. And yes, recovery for clone is planned.
  5. I don’t see any drawbacka in setting this to high value.
  6. In our next 0.8.0 release , all of these options will be exposed (with env vars as well as cli option) and documented.
  7. So far we don’t have it in our backlog, but will discuss it with the team and plan for it. Seems it will be useful.

I would also suggest if possible to wait for the next 0.8.0 release (planned for next month), we are improving replication performance (will be much more faster that will be needed for fast catchup during the multi-hour long clone) and adding bunch of config options, including all the ones mentioned in concern #6.

1 Like

@Inel_Pandzic Thank you so much for your feedback. quite helpful, as you advised. I think i will wait for the release of the 0.8 binary. Do you have an idea around when next month it’s going to be released?

Great, glad it helped. By the end of March we should release it, that’s the plan, don’t have a specific date.

What you could potentially do by than, especially to test clone speed.

You can setup your env, target cluster and pcsm host (generally it should be close to the target as possible, because of write latency). You can start PCSM and clone couple of gigs of data and stop it. Check pcsm status , in particular, initialSync.clonedSizeBytes field for, say 10 GBs, after that stop pcsm. Based on that number you can have an idea how fast will it be for your setup.