Error when restore

j2geu · April 27, 2020, 4:57pm

I encounter a problem when I want restore.I get this message :2020/04/28 00:18:01 Got command restore 2020-04-21T09:57:33Z
2020/04/28 00:18:01 [ERROR] unbale to run the restore while another backup or restore process runningAny idea ?I restart all server before start the restore to be sure there is no other process running.

I use this pbm version :Version: 1.1.1
Platform: linux/amd64
GitCommit: 457bc0eaf861c8c15c997333ce1d8108a138874b
GitBranch: master
BuildTime: 2020-01-31_13:17_UTC
GoVersion: go1.12.9

Thanks for help.

j2geu · April 28, 2020, 5:58am

So I have update to version 1.1.3 and I have more clear log.It is really strange, this is my backup file with some error :2020-04-21T08:00:18Z Failed with “some pbm-agents were lost during the backup”
2020-04-21T09:57:33Z
2020-04-27T09:15:23Z Failed with “some pbm-agents were lost during the backup”
But in 1.1.1 all was clear.If we see 2020-04-21T09:57:33Z we have this files :-rw-r–r-- 1 jonathan jonathan 2,3K avril 21 12:25 2020-04-21T09:57:33Z.pbm.json
-rw-r–r-- 1 jonathan jonathan 123K avril 21 11:57 2020-04-21T09:57:33Z_rs0.dump.gz
-rw-r–r-- 1 jonathan jonathan 51K avril 21 12:25 2020-04-21T09:57:33Z_rs0.oplog.gz
-rw-r–r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs0.dump.gz
-rw-r–r-- 1 jonathan jonathan 1,8K avril 21 12:25 2020-04-21T09:57:33Z_srs0.oplog.gz
-rw-r–r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs1.dump.gz
-rw-r–r-- 1 jonathan jonathan 1,3K avril 21 12:25 2020-04-21T09:57:33Z_srs1.oplog.gz
So we can see itis clearly a sharded database.
When I use the command :pbm restore 2020-04-21T09:57:33Z --mongodb-uri="mongodb://pbmuser:password@10.0.4.111:27019/?replicaSet=rs0"It is restore only the first shard srs0 and says the collection is not sharded :db.newtrans.stats()
{
“sharded” : false,
“primary” : “srs0”,
}Any clue of what happen ?

Andrew_Pogrebnoi · April 28, 2020, 10:02am

Hi @j2geu
1) Do you have running pbm-agents on all shards?
2) What does “pbm list --restore” shows?

Andrew_Pogrebnoi · April 28, 2020, 10:15am

Speaking of failed backups. “some pbm-agents were lost during the backup” - means that some pbm-agent(s) which ran backup failed to send a heartbeat. It can happen if an agent was restarted or a node experienced some network issues. Maybe agents’ logs reveal more details.

j2geu · April 29, 2020, 2:33am

@AndrewPogrebnoi Thanks for your help.So I erase all and start from the beginning, I have create a new sharded collection.Make a new backup:pbm list result 2020-04-29T07:46:41Z
2020-04-29T07:54:40Z
And when I restore, I have this:avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations
What does it mean ?

j2geu · April 29, 2020, 3:44am

Hello @AndrewPogrebnoi and @“Akira Kurogane”,I have drop database with db.dropDatabase() and db.adminCommand({flushRouterConfig:1}).I have restart all node and restart all pbm-agent.I have create a really small sharded collection.I backup it without error.But When I want to restore it, I keep the error :avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations

Andrew_Pogrebnoi · April 29, 2020, 7:27am

@j2geu
Are all the required roles granted for your pbmuser on the new cluster as it described in documentation Authentication — Percona Backup for MongoDB 1.8 Documentation?

j2geu · April 29, 2020, 7:44am

@AndrewPogrebnoi
I have not recreate a cluster, only delete data and flush some cache.Yes user and role are ok.2020-04-29T15:09:23.952+0200 E QUERY [js] uncaught exception: Error: Role “pbmAnyAction@admin” already exists :
2020-04-29T15:09:26.622+0200 E QUERY [js] uncaught exception: Error: couldn’t add user: User “pbmuser@admin” already exists :
When I restart PBM agent all seems to be ok :avril 29 15:12:55 MONGO1 systemd[1]: Stopping pbm-agent…
avril 29 15:12:55 MONGO1 systemd[1]: Stopped pbm-agent.
avril 29 15:12:55 MONGO1 systemd[1]: Started pbm-agent.
avril 29 15:12:55 MONGO1 pbm-agent[13571]: pbm agent is listening for the commands
I don’t understant the meaning of “conflicting namespace destinations”

j2geu · April 30, 2020, 4:23am

Hello @AndrewPogrebnoi and @“Akira Kurogane” ,Must I stop something during the restore like it is write for version 1.0.0. or do I just need to execute the restore commande ?I will delete all and recreate a complete new architecture to be sure it is not a problem with that.Thank you.

Andrew_Pogrebnoi · April 30, 2020, 10:54am

Hi @j2geu You don’t have to stop or restart anything before the restore. But keep in mind that if you do restart nodes and/or recreate a pbm user it’s better to restart pbm-agents as well.
Let us know how it went with the new architecture.
Thank you!

j2geu · April 30, 2020, 5:27pm

@AndrewPogrebnoi and @“Akira Kurogane” ,So, since I completly reinstall mongodb and pbm agent, all work well.In different mistake identify, there is ulimitsttings on somme servers. I put maximum value on all server as recommended by mongo.For the moment I only test with a really small set of data. I will test it with a biggest one.What is about the compress ption at “none” ? Is it ok to use it in production or not ?Thank you.

j2geu · May 2, 2020, 3:58am

@AndrewPogrebnoi
I encounter a out of memory problem.I have try to restore a “bigger” database (8GB) and pbm-agent failed.I have 4GB Ram
Here my ulimit -a on all server
jonathan@MONGO3:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 14908
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65535
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And here the stack trace (complete stack available here):mai 02 11:09:10 MONGO3 pbm-agent[22765]: 2020-05-02T11:09:10.914+0200        restoring bank.transaction from archive on stdin
mai 02 11:11:44 MONGO3 pbm-agent[22765]: fatal error: runtime: out of memory
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime stack:
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.throw(0xe83778, 0x16)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/panic.go:617 +0x72
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.sysMap(0xc0d4000000, 0x4000000, 0x18713f8)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mem_linux.go:170 +0xc7
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).sysAlloc(0x1858d00, 0x8ac000, 0x1858d10, 0x456)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:633 +0x1cd
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).grow(0x1858d00, 0x456, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1222 +0x42
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).allocSpanLocked(0x1858d00, 0x456, 0x1871408, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1150 +0x37f
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc_m(0x1858d00, 0x456, 0x101, 0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:977 +0xc2
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc.func1()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1048 +0x4c
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc(0x1858d00, 0x456, 0x7fdbcf000101, 0x427cb0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1047 +0x8a
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.largeAlloc(0x8ac000, 0x450100, 0xc0c2e3a000)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:1055 +0x99
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mallocgc.func1()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:950 +0x46
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack(0x0)
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mstart()
mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/proc.go:1153
mai 02 11:11:44 MONGO3 pbm-agent[22765]: goroutine 415 [running]:
mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack_switch()

j2geu · May 2, 2020, 4:51am

@AndrewPogrebnoi
So I pursue the debug.I increase my swap file to 4,3GB. And the restore was ok for a 8GB database (1,5GB of swap file was use).But after the restore, pbm-agent don’t release memory, I need to restart it.That is explain why sometimes I can restore a file and just after I can’t restore the same file and have out of memory error.
I will try to restoring a bigger database to see if there is a “memory leak” or if it is a fix size of memory use.

Andrew_Pogrebnoi · May 7, 2020, 9:54am

@j2geu That’s interesting. We created a ticket ([PBM-462] Possible memory leak during the restore - Percona JIRA) and will investigate that.
Thanks for the investigation and reporting it! I’ll keep you posted on the outcome.

j2geu · June 24, 2020, 3:09am

@AndrewPogrebnoi
Hello I encounter new problem on a new platform :
Jun 24 10:29:03 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:03 [INFO] Restore of ‘2020-06-24T08:21:01Z’ started
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        preparing collections to restore from
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRUsers (src) => admin.pbmRUsers (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.users (src) => admin.pbmRUsers (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRRoles (src) => admin.pbmRRoles (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.roles (src) => admin.pbmRRoles (dst)
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 Mark restore as failed restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations: <nil>
Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations

Do you have any idea where this come from ?I don’t understand, my pbmuser is name pbmuser not pbmRuser for information.

Andrew_Pogrebnoi · June 25, 2020, 6:31am

@j2geu
Is the destination cluster brand new? Were there any issues before that restore?
Can you try to drop admin.pbmRRoles and admin.pbmRUsers tables manually and re-run restore?

j2geu · June 25, 2020, 6:34am

@AndrewPogrebnoi The cluster is brand new but I have found there is a process who recreate collection with this index. So here is my error. I pursue the test because I had some other error with 1TB of data before. Thanks for your help and sorry to disturb you for nothing.

Andrew_Pogrebnoi · June 25, 2020, 9:26am

@j2geu no worries, glad that you’ve found the issue.

Cheers!

Topic		Replies	Views
Error Percona Restore Percona Backup for MongoDB	14	2067	April 15, 2021
Restore process hanged up Percona Backup for MongoDB	9	1149	March 12, 2021
Percona Mongodb Restore hung and data not restored Percona Backup for MongoDB percona , mongodb	1	949	June 17, 2021
Mongodb backup errors - percona backup Percona Backup for MongoDB	7	2154	April 15, 2021
Pbm mongo restore on aks Percona Backup for MongoDB percona	5	1366	December 17, 2021

Error when restore

Related topics