I started upgrading my cluster from 5.7 to 8.0 this weekend. The first upgraded node was started 14 hours ago and is still doing things and has not come online yet. There isn’t any output about what it’s actually doing, how far along it is, and how much longer it will take. The only thing I see is that the mysql.ibd file is continuing to grow in size.
This is a multi-tenant database with about ~160Gb of data and ~8000 individual databases.
Can anyone shed light on what this node is doing and how long I should expect the process to take?
Disaster recovery on any node in the cluster usually takes about 2.5 hours using 5.7. I was pretty surprised that this node has been initializing(?) for 14+ hours.
@mwarble As Pep says, this is expected. We recommend for your production upgrade that you first upgrade a replica and then promote that replica to become the new source. This should reduce your overall downtime to just a few minutes. You are absolutely safe to have a 5.7 source and 8.0 replica.
That’s what I was attempting to do, but after 24 hours the upgraded node was still doing it’s upgrade thing and I had no way of knowing how much longer it was going to take. Total amount of data on disk for the 5.7 node was 160Gb and the only progress I could measure was the growing size of mysql.ibd on the 8.0 node. After 24 hours it was up to 5Gb. I’m not sure if that file needs to reach 160Gb for the process to be completed, but if that’s the case at 5Gb a day, the upgrade process was going to take way too long. This may not be the case at all, but like I said, there doesn’t seem to be anyway to measure how long the process is going to take and how far along in the process it is.
Maybe you can tell me if this approach will work. If I dump all the data from the 5.7 node using mysqldump, which I do on a daily basis anyway, can those dump files be loaded into the 8.0 node safely? I tested loading a few of them and they seemed to load into a clean 8.0 node fine without any noticeable delay as compared to loading them into a 5.7 node.
If this is an acceptable approach, I’ll just do this because I can minimize down time this way and I’ll have a better idea of how long the process will take, which given the test I did, should only be a few hours as compared to 24+ hours the first approach was taking.