Not the answer you need?
Register and ask your own question!

Error when restore

j2geuj2geu Current User Role Supporter
I encounter a problem when I want restore.
I get this message :
2020/04/28 00:18:01 Got command restore 2020-04-21T09:57:33Z
2020/04/28 00:18:01 [ERROR] unbale to run the restore while another backup or restore process running
Any idea ?
I restart all server before start the restore to be sure there is no other process running.

I use this pbm version :
Version:   1.1.1
Platform:  linux/amd64
GitCommit: 457bc0eaf861c8c15c997333ce1d8108a138874b
GitBranch: master
BuildTime: 2020-01-31_13:17_UTC
GoVersion: go1.12.9

Thanks for help.


Answers

  • j2geuj2geu Current User Role Supporter
    So I have update to version 1.1.3 and I have more clear log.
    It is really strange, this is my backup file with some error :
    2020-04-21T08:00:18Z  Failed with "some pbm-agents were lost during the backup"
    2020-04-21T09:57:33Z
    2020-04-27T09:15:23Z  Failed with "some pbm-agents were lost during the backup"
    But in 1.1.1 all was clear.
    If we see 2020-04-21T09:57:33Z we have this files :
    -rw-r--r-- 1 jonathan jonathan 2,3K avril 21 12:25 2020-04-21T09:57:33Z.pbm.json
    -rw-r--r-- 1 jonathan jonathan 123K avril 21 11:57 2020-04-21T09:57:33Z_rs0.dump.gz
    -rw-r--r-- 1 jonathan jonathan  51K avril 21 12:25 2020-04-21T09:57:33Z_rs0.oplog.gz
    -rw-r--r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs0.dump.gz
    -rw-r--r-- 1 jonathan jonathan 1,8K avril 21 12:25 2020-04-21T09:57:33Z_srs0.oplog.gz
    -rw-r--r-- 1 jonathan jonathan 1,9G avril 21 12:25 2020-04-21T09:57:33Z_srs1.dump.gz
    -rw-r--r-- 1 jonathan jonathan 1,3K avril 21 12:25 2020-04-21T09:57:33Z_srs1.oplog.gz
    So we can see itis clearly a sharded database.
    When I use the command :
    pbm restore 2020-04-21T09:57:33Z --mongodb-uri="mongodb://pbmuser:[email protected]:27019/?replicaSet=rs0"
    It is restore only the first shard srs0 and says the collection is not sharded :
    db.newtrans.stats()
    {
            "sharded" : false,
            "primary" : "srs0",
    }
    Any clue of what happen ?





  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    Hi @j2geu
    1) Do you have running pbm-agents on all shards?
    2) What does "pbm list --restore" shows?
  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    Speaking of failed backups. "some pbm-agents were lost during the backup" - means that some pbm-agent(s) which ran backup failed to send a heartbeat. It can happen if an agent was restarted or a node experienced some network issues. Maybe agents' logs reveal more details.
  • j2geuj2geu Current User Role Supporter
    @AndrewPogrebnoi Thanks for your help.
    So I erase all and start from the beginning, I have create a new sharded collection.
    Make a new backup:
    pbm list result
      2020-04-29T07:46:41Z
      2020-04-29T07:54:40Z
    And when I restore, I have this:
    avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 Mark restore as failed `restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations`: <nil>
    avril 29 09:59:14 MONGO1 pbm-agent[32399]: 2020/04/29 09:59:14 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations
    What does it mean ?


  • j2geuj2geu Current User Role Supporter
    I have drop database with db.dropDatabase() and db.adminCommand({flushRouterConfig:1}).
    I have restart all node and restart all pbm-agent.
    I have create a really small sharded collection.
    I backup it without error.
    But When I want to restore it, I keep the error :
    avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 Mark restore as failed `restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations`: <nil>
    avril 29 11:12:56 MONGO1 pbm-agent[1549]: 2020/04/29 11:12:56 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations



  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    @j2geu
    Are all the required roles granted for your pbmuser on the new cluster as it described in documentation https://www.percona.com/doc/percona-backup-mongodb/authentication.html#create-the-pbm-user?
  • j2geuj2geu Current User Role Supporter
    I have not recreate a cluster, only delete data and flush some cache.
    Yes user and role are ok.
    2020-04-29T15:09:23.952+0200 E  QUERY    [js] uncaught exception: Error: Role "[email protected]" already exists :
    2020-04-29T15:09:26.622+0200 E  QUERY    [js] uncaught exception: Error: couldn't add user: User "[email protected]" already exists :
    When I restart PBM agent all seems to be ok :
    avril 29 15:12:55 MONGO1 systemd[1]: Stopping pbm-agent...
    avril 29 15:12:55 MONGO1 systemd[1]: Stopped pbm-agent.
    avril 29 15:12:55 MONGO1 systemd[1]: Started pbm-agent.
    avril 29 15:12:55 MONGO1 pbm-agent[13571]: pbm agent is listening for the commands
    I don't understant the meaning of "conflicting namespace destinations"



  • j2geuj2geu Current User Role Supporter
    Must I stop something during the restore like it is write for version 1.0.0. or do I just need to execute the restore commande ?
    I will delete all and recreate a complete new architecture to be sure it is not a problem with that.
    Thank you.
  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    You don't have to stop or restart anything before the restore. But keep in mind that if you do restart nodes and/or recreate a pbm user it's better to restart pbm-agents as well.

    Let us know how it went with the new architecture.

    Thank you!
  • j2geuj2geu Current User Role Supporter
    So, since I completly reinstall mongodb and pbm agent, all work well.
    In different mistake identify, there is ulimitsttings on somme servers. I put maximum value on all server as recommended by mongo.
    For the moment I only test with a really small set of data. I will test it with a biggest one.
    What is about the compress ption at "none" ? Is it ok to use it in production or not ?
    Thank you.

  • j2geuj2geu Current User Role Supporter
    I encounter a out of memory problem.
    I have try to restore a "bigger" database (8GB) and pbm-agent failed.
    I have 4GB Ram
    Here my ulimit -a on all server
    [email protected]:~$ ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 14908
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 65535
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 65535
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited

    And here the stack trace (complete stack available here):
    mai 02 11:09:10 MONGO3 pbm-agent[22765]: 2020-05-02T11:09:10.914+0200        restoring bank.transaction from archive on stdin
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: fatal error: runtime: out of memory
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime stack:
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.throw(0xe83778, 0x16)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/panic.go:617 +0x72
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.sysMap(0xc0d4000000, 0x4000000, 0x18713f8)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mem_linux.go:170 +0xc7
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).sysAlloc(0x1858d00, 0x8ac000, 0x1858d10, 0x456)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:633 +0x1cd
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).grow(0x1858d00, 0x456, 0x0)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1222 +0x42
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).allocSpanLocked(0x1858d00, 0x456, 0x1871408, 0x0)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1150 +0x37f
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc_m(0x1858d00, 0x456, 0x101, 0x0)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:977 +0xc2
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc.func1()
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1048 +0x4c
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.(*mheap).alloc(0x1858d00, 0x456, 0x7fdbcf000101, 0x427cb0)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/mheap.go:1047 +0x8a
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.largeAlloc(0x8ac000, 0x450100, 0xc0c2e3a000)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:1055 +0x99
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mallocgc.func1()
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/malloc.go:950 +0x46
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack(0x0)
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.mstart()
    mai 02 11:11:44 MONGO3 pbm-agent[22765]:         /usr/local/go/src/runtime/proc.go:1153
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: goroutine 415 [running]:
    mai 02 11:11:44 MONGO3 pbm-agent[22765]: runtime.systemstack_switch()
  • j2geuj2geu Current User Role Supporter
    edited May 2
    So I pursue the debug.
    I increase my swap file to 4,3GB. And the restore was ok for a 8GB database (1,5GB of swap file was use).
    But after the restore, pbm-agent don't release memory, I need to restart it.
    That is explain why sometimes I can restore a file and just after I can't restore the same file and have out of memory error.
    I will try to restoring a bigger database to see if there is a "memory leak" or if it is a fix size of memory use.

  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    That's interesting. We created a ticket (https://jira.percona.com/browse/PBM-462) and will investigate that.

    Thanks for the investigation and reporting it! I'll keep you posted on the outcome.
  • j2geuj2geu Current User Role Supporter
    Hello I encounter new problem on a new platform :
    Jun 24 10:29:03 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:03 [INFO] Restore of '2020-06-24T08:21:01Z' started
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        preparing collections to restore from
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRUsers (src) => admin.pbmRUsers (dst)
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.users (src) => admin.pbmRUsers (dst)
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.pbmRRoles (src) => admin.pbmRRoles (dst)
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020-06-24T10:29:07.023+0200        destination conflict: admin.system.roles (src) => admin.pbmRRoles (dst)
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 Mark restore as failed `restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations`: <nil>
    Jun 24 10:29:07 QPN-DS-MNG-CN04 pbm-agent[2115]: 2020/06/24 10:29:07 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): cannot restore with conflicting namespace destinations

    Do you have any idea where this come from ?
    I don't understand, my pbmuser is name pbmuser not pbmRuser for information.

  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role

    Is the destination cluster brand new? Were there any issues before that restore?

    Can you try to drop admin.pbmRRoles and admin.pbmRUsers tables manually and re-run restore?
  • j2geuj2geu Current User Role Supporter
    @AndrewPogrebnoi The cluster is brand new but I have found there is a process who recreate collection with this index. So here is my error. I pursue the test because I had some other error with 1TB of data before. Thanks for your help and sorry to disturb you for nothing.
  • AndrewPogrebnoiAndrewPogrebnoi Percona Staff Role
    @j2geu  no worries, glad that you've found the issue.

    Cheers!
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.