WSREP has not yet prepared node for application use after restarting service

I have a 5 node cluster. I checked the status and it still says joining. It has been over 10 hours. I tried querying the db and I am getting ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use The other nodes says the cluster size is 5 as well I also can see the bad node ip address in the wsrep_incoming_addresses status. Do I just reboot the bad node again?

Looking at the error log I am seeing 2022-09-15T08:19:59.021032Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 7223ms to flush 267 pages messages.
I also noticed my galera.cache file is at max

wsrep_provider_options="gcache.size = 5G"

Size of file is 5.1G Sep 15 09:00 galera.cache

When I run mysql -e "show global status like '%wsrep_local_state_comment%'\G"; from command line I am getting Variable_name: wsrep_local_state_comment and Value: Joined When I login to mysql and do a query I am still getting WSREP has not yet prepared node for application use Log file still has 2022-09-15T10:24:46.252378Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 37403ms to flush 1493 pages messages

The wsrep_local_recv_queue is 500. I am guessing I should reboot?

1 Like

Hello, did you check for problems in the logs of the bad node? did you check maybe the node is still going through SST?

1 Like

The log has.

0.0 (DB404): State transfer from 3.0 (DB403) complete.
[Galera] Shifting JOINER -> JOINED (TO: 799642)

But when I would look at the wsrep_local_state_comment it would say joining (even after 2 hours) also the wsrep_ready would say Off. But the SST was complete

1 Like

Look in the datadir of the joining node. Do you have an actual datadir or is it mostly empty? There might be an sst_in_progress file or some xtarbackup logs to inspect.

1 Like

Datadir was full. i can check the sst_in_progress log. i rebooted and sst_in_progress file is empty.

1 Like

If your datadir was full, then that’s your issue. You need more disk space. :slight_smile:

1 Like

After the reboot and sync still not joining the cluster. Status is joining there is no sst_in_progress file.
Here is the error log of the joiner. Joiner still has wsrep_local_state_comment as joining. I can see all of the nodes in the cluster. But this bad node refuses to join. The wsrep_local_state status is set to 1.

2022-09-15T21:55:41.141321Z 0 [Note] [MY-000000] [WSREP-SST] Running post-processing...........
2022-09-15T21:55:41.147995Z 0 [Note] [MY-000000] [WSREP-SST] Skipping mysql_upgrade (sst): local version (8.0.27) == donor version (8.0.27)
2022-09-15T21:55:41.241224Z 0 [Note] [MY-000000] [WSREP-SST] Waiting for server instance to start.....  This may take some time
2022-09-15T21:55:48.582854Z 0 [Note] [MY-000000] [WSREP-SST] ...........post-processing done
2022-09-15T21:55:49.437058Z 0 [Note] [MY-011952] [InnoDB] If the mysqld execution user is authorized, page cleaner and LRU manager thread priority can be changed. See the man page of setpriority().
2022-09-15T21:55:49.437808Z 4 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
[Note] [MY-011089] [Server] Data dictionary restarting version '80023'.
[System] [MY-000000] [WSREP] PXC upgrade completed successfully
[Note] [MY-010006] [Server] Using data dictionary with version '80023'.
[Note] [MY-011025] [Repl] Failed to start slave threads for channel ''.
[System] [MY-000000] [WSREP] SST completed
[Note] [MY-000000] [Galera] Receiving IST...  0.0% ( 0/85 events) complete.
[Note] [MY-000000] [Galera] Receiving IST...100.0% (85/85 events) complete.
2022-09-15T21:55:54.972261Z 0 [Note] [MY-000000] [Galera] 3.0 (DB404): State transfer from 1.0 (DB405) complete.
2022-09-15T21:55:54.972301Z 0 [Note] [MY-000000] [Galera] SST leaving flow control
2022-09-15T21:55:54.972317Z 0 [Note] [MY-000000] [Galera] Shifting JOINER -> JOINED (TO: 800712)
2022-09-15T21:56:04.286806Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 6203ms to flush 100 pages

The joiner immediately goes into Page cleaner message once it is finished the process.
I have plenty of space left. total:1.8T used:1.3T available:493G. By datadir being full I meant it had al of the expected data. Not that it was actually full.

Donor node has 2022-09-15T21:55:54.972452Z 0 [Note] [MY-000000] [Galera] 3.0 (DB404): State transfer from 1.0 (DB405) complete.

1 Like

Try increasing innodb_io_capacity by 2x to help with the page cleaner issue. But there’s no more in the log file?

1 Like

I will add that and see how it goes. And no there is no more info in the log file. Only the page cleaner messages.

1 Like