Hey folks!
I found the issue! And it turns out that missing sst_info
was the sympton and not the root cause
So we’re using Linkerd as our service mesh. We have a policy rule that does not enable linkerd when a pod is triggered by a Job. Since the backup job listens on port 4444 and as I mentioned above, it was stuck at:
socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
So I suspected it was waiting for some packets to come and maybe linkerd was in the middle of it.
I added port 4444 to the list of ports to skip when doing outbound connections and it worked!
So I think we can close this, though it might be interesting to others!