Error when restore

I encounter a problem when I want restore.I get this message :2020/04/28 00:18:01 Got command restore 2020-04-21T09:57:33Z
2020/04/28 00:18:01 [ERROR] unbale to run the restore while another backup or restore process runningAny idea ?I restart all server before start the restore to be sure there is no other process running.

I use this pbm version :Version:   1.1.1
Platform:  linux/amd64
GitCommit: 457bc0eaf861c8c15c997333ce1d8108a138874b
GitBranch: master
BuildTime: 2020-01-31_13:17_UTC
GoVersion: go1.12.9

Thanks for help.


So I have update to version 1.1.3 and I have more clear log.It is really strange, this is my backup file with some error :2020-04-21T08:00:18Z  Failed with “some pbm-agents were lost during the backup”
2020-04-21T09:57:33Z
2020-04-27T09:15:23Z  Failed with “some pbm-agents were lost during the backup”
But in 1.1.1 all was clear.If we see 2020-04-21T09:57:33Z we have this files :-rw-r–r-- 1 jonathan jonathan 2,3K avril 21 12:25 2020-04-21T09:57:33Z.pbm.json
-rw-r–r-- 1 jonathan jonathan 123K avril 21 11:57 2020-04-21T09:57:33Z_rs0.dump.gz
-rw-r–r-- 1 jonathan jonathan  51K avril 21 12:25 2020-04-21T09:57:33Z_rs0.oplog.gz
-rw-r–r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs0.dump.gz
-rw-r–r-- 1 jonathan jonathan 1,8K avril 21 12:25 2020-04-21T09:57:33Z_srs0.oplog.gz
-rw-r–r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs1.dump.gz
-rw-r–r-- 1 jonathan jonathan 1,3K avril 21 12:25 2020-04-21T09:57:33Z_srs1.oplog.gz
So we can see itis clearly a sharded database.
When I use the command :pbm restore 2020-04-21T09:57:33Z --mongodb-uri="mongodb://pbmuser:password@10.0.4.111:27019/?replicaSet=rs0"It is restore only the first shard srs0 and says the collection is not sharded :db.newtrans.stats()
{
        “sharded” : false,
        “primary” : “srs0”,
}Any clue of what happen ?





Hi @j2geu
1) Do you have running pbm-agents on all shards?
2) What does “pbm list --restore” shows?

Speaking of failed backups. “some pbm-agents were lost during the backup” - means that some pbm-agent(s) which ran backup failed to send a heartbeat. It can happen if an agent was restarted or a node experienced some network issues. Maybe agents’ logs reveal more details.

@AndrewPogrebnoi Thanks for your help.So I erase all and start from the beginning, I have create a new sharded collection.Make a new backup:pbm list result  2020-04-29T07:46:41Z
  2020-04-29T07:54:40Z
And when I restore, I have this:avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations
What does it mean ?


Hello @AndrewPogrebnoi and @“Akira Kurogane”,I have drop database with db.dropDatabase() and db.adminCommand({flushRouterConfig:1}).I have restart all node and restart all pbm-agent.I have create a really small sharded collection.I backup it without error.But When I want to restore it, I keep the error :avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations



@j2geu
Are all the required roles granted for your pbmuser on the new cluster as it described in documentation Authentication — Percona Backup for MongoDB 1.8 Documentation?

@AndrewPogrebnoi
I have not recreate a cluster, only delete data and flush some cache.Yes user and role are ok.2020-04-29T15:09:23.952+0200 E  QUERY    [js] uncaught exception: Error: Role “pbmAnyAction@admin” already exists :
2020-04-29T15:09:26.622+0200 E  QUERY    [js] uncaught exception: Error: couldn’t add user: User “pbmuser@admin” already exists :
When I restart PBM agent all seems to be ok :avril 29 15:12:55 MONGO1 systemd[1]: Stopping pbm-agent…
avril 29 15:12:55 MONGO1 systemd[1]: Stopped pbm-agent.
avril 29 15:12:55 MONGO1 systemd[1]: Started pbm-agent.
avril 29 15:12:55 MONGO1 pbm-agent[13571]: pbm agent is listening for the commands
I don’t understant the meaning of “conflicting namespace destinations”


Hello @AndrewPogrebnoi and @“Akira Kurogane” ,Must I stop something during the restore like it is write for version 1.0.0. or do I just need to execute the restore commande ?I will delete all and recreate a complete new architecture to be sure it is not a problem with that.Thank you.

Hi @j2geu You don’t have to stop or restart anything before the restore. But keep in mind that if you do restart nodes and/or recreate a pbm user it’s better to restart pbm-agents as well.
Let us know how it went with the new architecture.
Thank you!

@AndrewPogrebnoi and @“Akira Kurogane” ,So, since I completly reinstall mongodb and pbm agent, all work well.In different mistake identify, there is ulimitsttings on somme servers. I put maximum value on all server as recommended by mongo.For the moment I only test with a really small set of data. I will test it with a biggest one.What is about the compress ption at “none” ? Is it ok to use it in production or not ?Thank you.

@AndrewPogrebnoi
I encounter a out of memory problem.I have try to restore a “bigger” database (8GB) and pbm-agent failed.I have 4GB Ram
Here my ulimit -a on all server
jonathan@MONGO3:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 14908
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And here the stack trace (complete stack available here):mai 02 11:09:10 MONGO3 pbm-agent[22765]: 2020-05-02T11:09:10.914+0200        restoring bank.transaction from archive on stdin
mai 02 11:11:44 MONGO3 pbm-agent[22765]: fatal error: runtime: out of memory
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime stack:
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.throw(0xe83778, 0x16)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/panic.go:617 +0x72
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.sysMap(0xc0d4000000, 0x4000000, 0x18713f8)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mem_linux.go:170 +0xc7
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).sysAlloc(0x1858d00, 0x8ac000, 0x1858d10, 0x456)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:633 +0x1cd
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).grow(0x1858d00, 0x456, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1222 +0x42
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).allocSpanLocked(0x1858d00, 0x456, 0x1871408, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1150 +0x37f
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc_m(0x1858d00, 0x456, 0x101, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:977 +0xc2
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc.func1()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1048 +0x4c
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc(0x1858d00, 0x456, 0x7fdbcf000101, 0x427cb0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1047 +0x8a
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.largeAlloc(0x8ac000, 0x450100, 0xc0c2e3a000)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:1055 +0x99
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mallocgc.func1()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:950 +0x46
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack(0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mstart()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/proc.go:1153
mai 02 11:11:44 MONGO3 pbm-agent[22765]: goroutine 415 [running]:
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack_switch()

@AndrewPogrebnoi
So I pursue the debug.I increase my swap file to 4,3GB. And the restore was ok for a 8GB database (1,5GB of swap file was use).But after the restore, pbm-agent don’t release memory, I need to restart it.That is explain why sometimes I can restore a file and just after I can’t restore the same file and have out of memory error.
I will try to restoring a bigger database to see if there is a “memory leak” or if it is a fix size of memory use.

@j2geu That’s interesting. We created a ticket ([PBM-462] Possible memory leak during the restore - Percona JIRA) and will investigate that.
Thanks for the investigation and reporting it! I’ll keep you posted on the outcome.

@AndrewPogrebnoi
Hello I encounter new problem on a new platform :
Jun 24 10:29:03 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:03 [INFO] Restore of ‘2020-06-24T08:21:01Z’ started
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        preparing collections to restore from
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRUsers (src) => admin.pbmRUsers (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.users (src) => admin.pbmRUsers (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRRoles (src) => admin.pbmRRoles (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.roles (src) => admin.pbmRRoles (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations

Do you have any idea where this come from ?I don’t understand, my pbmuser is name pbmuser not pbmRuser for information.

@j2geu
Is the destination cluster brand new? Were there any issues before that restore?
Can you try to drop admin.pbmRRoles and admin.pbmRUsers tables manually and re-run restore?

@AndrewPogrebnoi The cluster is brand new but I have found there is a process who recreate collection with this index. So here is my error. I pursue the test because I had some other error with 1TB of data before. Thanks for your help and sorry to disturb you for nothing.

@j2geu  no worries, glad that you’ve found the issue.

Cheers!