After deploying a cluster (3 node), I am getting repeated errors when the backup runs (on demand and scheduled). I’ve attached the yaml file used to deploy the cluster as well as the backup pod logs.
The ending result is the following error each time:
“Process completed with error: /usr/bin/run_backup.sh: 1 (Operation not permitted)”
Not sure why it’s not permitted within the pod. Since it stops, exec’ing into it is not possible to see why.
While I don’t have an immediate solution, I suggest you do what you can to gather the logs while the backup is running. What you’ve shared in backuplog.txt is from the garbd node, which is a “fake” PXC member that presents as a JOINER and asks the cluster for a DONOR to provide SST. What I want to see is what does the node that was elected DONOR have to say.
The way I do this is set up a shell doing an oc logs -f <pod>
Because you can guess but you won’t always guess right which node will be DONOR, so best to do it against all three PXC containers.
What you are looking for is an error on the DONOR side related to SST, so send us the slightly before to slightly after log entries. Then we’ll have a look at what’s going on within the cluster.
PXC is just like that - you generally (always?) want to be looking at logs from all machines. Good luck and looking forward to assisting you further!
Thanks @Michael_Coburn - apologies on the delay in response.
I recreated the deploy and triggered the backup manually. The error still occured and I’ve attached the output of all 3 pxc instances. pxc-2 seems to have had the error, but I’m not sure what the cause is and if you could review, I’d be appreciative. My deployment for pxc is attached as well. pxc-2.txt (58.2 KB) pxc-1.txt (46.1 KB) pxc-0.txt (47.6 KB) cr-sws.txt (6.4 KB)
Hello @rtanner,
I’ve seen this before. Operation not permitted is most likely be caused by the underlying storage. I see you are trying to backup to PVCs with a stroageClassName: thin.
What is the storage behind this PVC? Which CNI is that?
Please check if Pods can write to this storage by creating some dummy Pod and attaching the PVC to it.
Thank you @Sergey_Pronin, @Michael_Coburn and @Mykola. I ended up running a few tests and came up with a workaround. In the 1.6 operator, I had to remove the compression option. Not sure why it caused an issue, but it was the final factor after multiple changes. However, that same option worked fine when I upgraded to 1.7, so I’m working ok in 1.7 and all compression options work as expected.
I’m going to mark this closed with migration to 1.7 as the best option/answer.