PBM physical type backup failed when getting backupCursor with error of file not exist, please advice

I am trying to verify the physical backup functionality of pbm according to doc (Experimental Feature: $backupCursorExtend in Percona Server for MongoDB - Percona Database Performance Blog) and with no luck.
I met with problem when exec “pbm backup -t physical”. This error could also been activated by exec “db.aggregate([{$backupCursor: {}}])” in the mongo shell and always get a exception like this:

m1:PRIMARY> db.aggregate([{$backupCursor: {}}])
uncaught exception: Error: command failed: {
“operationTime” : Timestamp(1655812554, 4),
“ok” : 0,
“errmsg” : “Failed to get a file’s size. Filename: /data/db/key.db/journal/WiredTigerLog.0000000004 Error: No such file or directory”,
“code” : 31403,
“codeName” : “Location31403”,
“$clusterTime” : {
“clusterTime” : Timestamp(1655812554, 4),
“signature” : {
“hash” : BinData(0,“AAAAAAAAAAAAAAAAAAAAAAAAAAA=”),
“keyId” : NumberLong(0)
}
}
} : aggregate failed :

In the db folder, this folder “/data/db/key.db/journal/” do not even exist. But still got the same error if I create this folder in advance. The mongo server version is v4.4.6-8. Does anyone have some advice on this ? Thank you.

1 Like

After some digging, I found out this error is because pbm can not find the fixed journal folder. I saw some discuss about make journal folder configurable a year ago. While on current master branch, journal folder is fixed in /data/db/journal. So I add some softlink to bypass this error and can make physical backup succeed. Wooha~

However, while restoring, the cluster state can not be ready anymore. digging…

1 Like

Hello @Xiaolu,

I assume you have dbpath assigned to /data/db. Then PSMDB will use /data/db/key.db only if you enabled data at rest encryption. Thus we have situation when data at rest encryption is enabled but /data/db/key.db does not exist.
I have no idea how this happened on you machine. key.db subdir is created when you start PSMDB for the first time with empty dbpath and is never deleted by PSMDB.
Probably you enabled encryption when your dbpath dir was already initialized? This scenario is not supported because existing unencrypted instance cannot be switched to encrypted mode.

1 Like

Hello, @Igor_Solodovnikov
You are right that my instance enabled data at rest encryption by default. My situation is /data/db/key.db exist while /data/db/key.db/journal does not exist. Since physical backup will copy journal file from /data/db/key.db/journal with data at rest encryption enabled, I have to add a soft link to bypass this error.

Blockquote cd /data/db/key.db && ln -s …/journal journal

1 Like

Hi @Xiaolu
Has the original issue happened before any pbm restore? I mean it was the first time you’ve tried to run “pbm backup -t physical” that lead to the error, right?

1 Like

That’s right, @Andrew_Pogrebnoi
Any idea about this error?

1 Like

Not yet @Xiaolu. I wanted to ensure first that the pbm restore doesn’t cause it.

1 Like

Hi @Xiaolu

I believe this issue is fixed by PSMDB-1119. That is it is fixed in these releases of Percona server for MonfoDB: 4.2.22-22, 4.4.16-16, 5.0.11-10

1 Like

Hi, Igor

Thank you. I will try that fix.

1 Like

Hello everyone, I am still facing this issue on latest 2.0.5 release while performing a physical backup,

2023-03-29T12:24:42Z I [replicaset/:27019] [backup/2023-03-29T12:24:40Z] mark RS as error get backup files: create backupCursor: (Location31403) Failed to get a file's size. Filename: /var/lib/mongo/collection-102-2653468610188052465.wt Error: No such file or directory:
2023-03-29T12:24:42Z I [replicaset/:27019] [backup/2023-03-29T12:24:40Z] mark backup as error get backup files: create backupCursor: (Location31403) Failed to get a file's size. Filename: /var/lib/mongo/collection-102-2653468610188052465.wt Error: No such file or directory:
2023-03-29T12:24:42Z E [replicaset:27019] [backup/2023-03-29T12:24:40Z] backup: get backup files: create backupCursor: (Location31403) Failed to get a file’s size. Filename: /var/lib/mongo/collection-102-2653468610188052465.wt Error: No such file or directory

The file exists and is owned by “mongod” user :
[root@mongo]# ls -alh | grep -i “collection-102-2653468610188052465.wt”
-rw-------. 1 mongod mongod 36K Mar 29 12:08 collection-102-2653468610188052465.wt

1 Like

Hi @Tin_Cvitkovic
What PSMDB version do you use?
What happens if you run db.getSiblingDB("admin").aggregate([{$backupCursor: {}}]) in the mongo shell?

1 Like

Using psdmdb:4.2.23.
Works without issue, I get my correct file locations and it’s respective fileSize
replicaset:PRIMARY> db.getSiblingDB(“admin”).aggregate([{$backupCursor: {}}])
{ “metadata” : { “backupId” : UUID(“8f2885c0-6b07-457f-af4b-aea03eebf794”), “dbpath” : “/var/lib/mongo”, “oplogStart” : { “ts” : Timestamp(1680171353, 1), “t” : NumberLong(1) }, “oplogEnd” : { “ts” : Timestamp(1680174336, 1), “t” : NumberLong(2) }, “checkpointTimestamp” : Timestamp(1680174303, 1) } }
{ “filename” : “/var/lib/mongo/collection-126–2497108409019845431.wt”, “fileSize” : NumberLong(4096) }
{ “filename” : “/var/lib/mongo/collection-144–2497108409019845431.wt”, “fileSize” : NumberLong(69632) }
{ “filename” : “/var/lib/mongo/index-131–2497108409019845431.wt”, “fileSize” : NumberLong(8192) }
{ “filename” : “/var/lib/mongo/index-29–2497108409019845431.wt”, “fileSize” : NumberLong(36864) }
{ “filename” : “/var/lib/mongo/index-106–2497108409019845431.wt”, “fileSize” : NumberLong(208896) }

1 Like

Hi @Tin_Cvitkovic

Please try to check this behavior with recently release PSMDB version 4.2.24. It has some fixes of $backupCursor functionality.

1 Like

Same thing is happening with 4.2.24 version…
2023-03-30T12:11:56Z I [replicaset/:27018] [backup/2023-03-30T12:11:48Z] mark RS as error upload file /var/lib/mongo/collection-6–5627329573051636936.wt: get file stat: stat /var/lib/mongo/collection-6--5627329573051636936.wt: no such file or directory:
2023-03-30T12:11:56Z D [replicaset/:27018] [backup/2023-03-30T12:11:48Z] set balancer on
2023-03-30T12:11:56Z E [replicaset/:27018] [backup/2023-03-30T12:11:48Z] backup: upload file /var/lib/mongo/collection-6--5627329573051636936.wt: get file stat: stat /var/lib/mongo/collection-6–5627329573051636936.wt: no such file or directory

This is a sharded cluster btw, but it also fails on standard replicaset with 4.2.24 version too, I’ve seen it working in 2.0.3 before upgrading, what could have caused this ? When should I expect to get back from you
Thanks in advance already.

1 Like

The strange thing is that previously it was an error from PSMDB while opening the cursor (create backupCursor: (Location31403)). But the latest it is from PBM trying to copy the file (get file stat). Although in both cases the root cause is No such file or directory.

Can you check the rights of the data files and if pbm-agents have access to it?

Does /var/lib/mongo/collection-6–5627329573051636936.wt exists on the same node that produced the error? pbm-agent should be run on each replicaset node and have access to the local datadir.

What’s in pbm status output?

Cheers

1 Like

Previous issue was fixed by PSMDB-1119. That is now we are facing another issue.
You are right - the new issue looks very similar to some access rights problem