WSREP has not yet prepared node for application use after restarting service

I have a 5 node cluster. I checked the status and it still says joining. It has been over 10 hours. I tried querying the db and I am getting ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use The other nodes says the cluster size is 5 as well I also can see the bad node ip address in the wsrep_incoming_addresses status. Do I just reboot the bad node again?

Looking at the error log I am seeing 2022-09-15T08:19:59.021032Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 7223ms to flush 267 pages messages.
I also noticed my galera.cache file is at max

wsrep_provider_options="gcache.size = 5G"

Size of file is 5.1G Sep 15 09:00 galera.cache

When I run mysql -e "show global status like '%wsrep_local_state_comment%'\G"; from command line I am getting Variable_name: wsrep_local_state_comment and Value: Joined When I login to mysql and do a query I am still getting WSREP has not yet prepared node for application use Log file still has 2022-09-15T10:24:46.252378Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 37403ms to flush 1493 pages messages

The wsrep_local_recv_queue is 500. I am guessing I should reboot?

Hello, did you check for problems in the logs of the bad node? did you check maybe the node is still going through SST?

The log has.

0.0 (DB404): State transfer from 3.0 (DB403) complete.
[Galera] Shifting JOINER -> JOINED (TO: 799642)

But when I would look at the wsrep_local_state_comment it would say joining (even after 2 hours) also the wsrep_ready would say Off. But the SST was complete

Look in the datadir of the joining node. Do you have an actual datadir or is it mostly empty? There might be an sst_in_progress file or some xtarbackup logs to inspect.

Datadir was full. i can check the sst_in_progress log. i rebooted and sst_in_progress file is empty.

If your datadir was full, then that’s your issue. You need more disk space. :slight_smile:

After the reboot and sync still not joining the cluster. Status is joining there is no sst_in_progress file.
Here is the error log of the joiner. Joiner still has wsrep_local_state_comment as joining. I can see all of the nodes in the cluster. But this bad node refuses to join. The wsrep_local_state status is set to 1.

2022-09-15T21:55:41.141321Z 0 [Note] [MY-000000] [WSREP-SST] Running post-processing...........
2022-09-15T21:55:41.147995Z 0 [Note] [MY-000000] [WSREP-SST] Skipping mysql_upgrade (sst): local version (8.0.27) == donor version (8.0.27)
2022-09-15T21:55:41.241224Z 0 [Note] [MY-000000] [WSREP-SST] Waiting for server instance to start.....  This may take some time
2022-09-15T21:55:48.582854Z 0 [Note] [MY-000000] [WSREP-SST] ...........post-processing done
2022-09-15T21:55:49.437058Z 0 [Note] [MY-011952] [InnoDB] If the mysqld execution user is authorized, page cleaner and LRU manager thread priority can be changed. See the man page of setpriority().
2022-09-15T21:55:49.437808Z 4 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
[Note] [MY-011089] [Server] Data dictionary restarting version '80023'.
[System] [MY-000000] [WSREP] PXC upgrade completed successfully
[Note] [MY-010006] [Server] Using data dictionary with version '80023'.
[Note] [MY-011025] [Repl] Failed to start slave threads for channel ''.
[System] [MY-000000] [WSREP] SST completed
[Note] [MY-000000] [Galera] Receiving IST...  0.0% ( 0/85 events) complete.
[Note] [MY-000000] [Galera] Receiving IST...100.0% (85/85 events) complete.
2022-09-15T21:55:54.972261Z 0 [Note] [MY-000000] [Galera] 3.0 (DB404): State transfer from 1.0 (DB405) complete.
2022-09-15T21:55:54.972301Z 0 [Note] [MY-000000] [Galera] SST leaving flow control
2022-09-15T21:55:54.972317Z 0 [Note] [MY-000000] [Galera] Shifting JOINER -> JOINED (TO: 800712)
2022-09-15T21:56:04.286806Z 0 [Note] [MY-011953] [InnoDB] Page cleaner took 6203ms to flush 100 pages

The joiner immediately goes into Page cleaner message once it is finished the process.
I have plenty of space left. total:1.8T used:1.3T available:493G. By datadir being full I meant it had al of the expected data. Not that it was actually full.

Donor node has 2022-09-15T21:55:54.972452Z 0 [Note] [MY-000000] [Galera] 3.0 (DB404): State transfer from 1.0 (DB405) complete.

Try increasing innodb_io_capacity by 2x to help with the page cleaner issue. But there’s no more in the log file?

I will add that and see how it goes. And no there is no more info in the log file. Only the page cleaner messages.