I’m in the process of trying to upgrade a 3-node cluster, and having some problems. During that, I got into a state where 2 cluster nodes were upgraded, and 1 wasn’t. Only the live one was running, and the others were needing but failing to do an SST.
After considerable experimentation, and reading stuff on here, there seem to be two key factors:
- The version of percona-xtrabackup-80 needs to be the same in order to do an SST.
- So does the version of percona-xtradb-cluster-server.
I can kind of understand the first one, and since you can upgrade/downgrade that package without affecting a live service, it’s not too inconvenient. But what I saw was just that the prepare would crash with a cryptic error.
The second one was more inconvenient. In the end, as an experiment, I downgraded one of the nodes that was down to see if it could do an SST, which it could. Then I upgraded it again and this time the IST worked so it stayed up.
Could we have a more informative message that made it clear on the consuming node that the donor SST had been generated with xtrabackup version X and server version Y and could only be consumed with the same version?
I also wondered whether if the server can upgrade an old database when it starts up on a new version, could we not apply the same logic during the prepare? That way we could consume an SST from an old version.
Now that I’ve hit this, I’ll know the answers. But it took a day, and I suspect other people will spend time on it too.