[WSREP-SST] bin-log post-processing failed - When adding node02 to bootstraped node01

Hello all, thanks for checking into this post.

I’ve followed the documentation for building a Percona XtraDB cluster and spun up my first node PXC node, pxc-dbnode01. It is also a slave of an existing DB that we are replicating from, to mock up our environment that we are trying to upgrade here. We used xtradb-backup24 to copy over a backup from the ExistingDB (Percona 5.7) to pxc-dbnode01. We set the binary position and started slave and replication from ExistingDB looks good. Adds to ExistingDB show up on pxc-dbnode01.

After starting pxc-dbnode01 in bootstrap mode systemctl start mysql@bootstrap.service, it shows it’s up in a single node cluster.

| wsrep_cluster_conf_id            | 3                                                                                                                              
| wsrep_cluster_size               | 1

Replication for ExistingDB still works as well.

I’ve spun up pxc-dbnode02. To keep things simple for now, I’ve disabled encryption. Using a simple config I start mysql and it looks good… to start…

2021-10-08T02:05:33.404617Z 0 [Note] [MY-000000] [Galera] (38641378-8a31, 'tcp://0.0.0.0:4567') connection established to 1a418e15-8efd tcp://172.16.5.79:4567
2021-10-08T02:05:33.404777Z 0 [Note] [MY-000000] [Galera] (38641378-8a31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-10-08T02:05:33.904462Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2021-10-08T02:05:33.904918Z 0 [Note] [MY-000000] [Galera] declaring 1a418e15-8efd at tcp://172.16.5.79:4567 stable
2021-10-08T02:05:33.905290Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2021-10-08T02:05:33.906031Z 0 [Note] [MY-000000] [Galera] Node 1a418e15-8efd state primary
2021-10-08T02:05:33.907792Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,1a418e15-8efd,2)
memb {
	1a418e15-8efd,0
	38641378-8a31,0
	}
joined {
	}
left {
	}
partitioned {
	}
)

We get to SST

-08T02:05:37.133598Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2021-10-08T02:05:37.176922Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2021-10-08T02:05:52.507165Z 0 [Note] [MY-000000] [WSREP-SST] Preparing the backup at /var/lib/mysql//sst-xb-tmpdir
2021-10-08T02:05:52.515072Z 0 [Note] [MY-000000] [Galera] 0.0 (pxc-dbnode01): State transfer to 1.0 (pxc-dbnode02) complete.

Which looks good.

But then we die when starting post processing.

2021-10-08T02:05:55.398390Z 0 [Note] [MY-000000] [WSREP-SST] Running post-processing...........
2021-10-08T02:05:55.403056Z 0 [Note] [MY-000000] [WSREP-SST] Skipping mysql_upgrade (sst): local version (8.0.23) == donor version (8.0.23)
2021-10-08T02:05:55.499786Z 0 [Note] [MY-000000] [WSREP-SST] Waiting for server instance to start.....  This may take some time
2021-10-08T02:05:58.594425Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2021-10-08T02:05:58.595011Z 0 [ERROR] [MY-000000] [WSREP-SST] Failed to start the mysql server that checks for async replication.
2021-10-08T02:05:58.595044Z 0 [ERROR] [MY-000000] [WSREP-SST] Check the parameters and retry
2021-10-08T02:05:58.595067Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 488  pid:6880
2021-10-08T02:05:58.596237Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ mysql error log (START) ------------
	---- Starting the MySQL server used for post-processing ----
	2021-10-08T02:05:55.800429Z 0 [Warning] [MY-000000] [WSREP] Node is not a cluster node. Disabling pxc_strict_mode
	2021-10-08T02:05:55.800950Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.23-14.1) starting as process 6880
	2021-10-08T02:05:55.810194Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
	2021-10-08T02:05:55.992372Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
	2021-10-08T02:05:56.134306Z 1 [Note] [MY-000000] [WSREP] wsrep_init_schema_and_SR (nil)
	2021-10-08T02:05:56.144152Z 1 [System] [MY-000000] [WSREP] PXC upgrade completed successfully
	2021-10-08T02:05:56.150353Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /var/run/mysqld/mysqlx.sock
	2021-10-08T02:05:56.170501Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 45f65d13-27dc-11ec-bb09-0050569114c8.
	mysqld: File '/var/lib/mysql/.000002' not found (OS errno 2 - No such file or directory)
	2021-10-08T02:05:56.171167Z 0 [ERROR] [MY-010958] [Server] Could not open log file.
	2021-10-08T02:05:56.171303Z 0 [ERROR] [MY-010041] [Server] Can't init tc log
	2021-10-08T02:05:56.171815Z 0 [ERROR] [MY-010119] [Server] Aborting
	2021-10-08T02:05:57.649895Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.23-14.1)  Percona XtraDB Cluster (GPL), Release rel14, Revision d3b9a1d, WSREP version 26.4.3.

2021-10-08T02:05:58.596319Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ mysql error log (END) ------------
2021-10-08T02:05:58.596346Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2021-10-08T02:05:58.596606Z 0 [Note] [MY-000000] [WSREP-SST] ...........post-processing failed.  Exiting
2021-10-08T02:05:58.596731Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:3
2021-10-08T02:05:58.614879Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.16.5.72' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '6186' --mysqld-version '8.0.23-14.1'   '' : 3 (No such process)
2021-10-08T02:05:58.615018Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2021-10-08T02:05:58.615081Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 3 (No such process)
2021-10-08T02:05:58.615199Z 3 [Note] [MY-000000] [Galera] Processing SST received
2021-10-08T02:05:58.615258Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1
2021-10-08T02:05:58.615308Z 3 [System] [MY-000000] [WSREP] SST completed
2021-10-08T02:05:58.615474Z 1 [Note] [MY-000000] [Galera]  str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 2 req->ist_len(): 65
2021-10-08T02:05:58.615538Z 1 [ERROR] [MY-000000] [Galera] Application received wrong state:
	Received: 00000000-0000-0000-0000-000000000000
	Required: 1a420a1b-27db-11ec-a8f7-fe3cee190057
2021-10-08T02:05:58.615584Z 1 [ERROR] [MY-000000] [Galera] Application state transfer failed. This is unrecoverable condition, restart required.
2021-10-08T02:05:58.615628Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2021-10-08T02:05:58.615672Z 1 [Note] [MY-000000] [Galera] Closing send monitor...
2021-10-08T02:05:58.615714Z 1 [Note] [MY-000000] [Galera] Closed send monitor.
2021-10-08T02:05:58.615756Z 1 [Note] [MY-000000] [Galera] gcomm: terminating thread

There is one line that worries me. It’s inside the fatal error notification, in the log of the MySQL instance “that checks for async replication”. starting after 2021-10-08T02:05:58.594425Z. The specific line is:

	mysqld: File '/var/lib/mysql/.000002' not found (OS errno 2 - No such file or directory)
	2021-10-08T02:05:56.171167Z 0 [ERROR] [MY-010958] [Server] Could not open log file.

Every time I restart mysql, the binlog number increments as usual, but for some reason, the filename prefix appears to be gone. In the configuration we have log_bin = /var/lib/mysql/mysql-bin.log, and that file is in the file system on pxc-dbnode02 as this happens.

-rw-r-----  1 mysql mysql  156 Oct  7 22:05  mysql-bin.log.000002

/var/lib/mysql is owned by mysql. The file is owned by mysql. It honestly looks like the filename prefix is being trimmed off.

I’m sure there must be something small that I’m missing here. But I don’t know what next to try. Thoughts?

pxc-dbnode01

[client]
port		= 3306
socket		= /var/run/mysqld/mysqld.sock

[mysqld]

server-id                     = 79

wsrep_cluster_address         = gcomm://
wsrep_cluster_name            = test-cluster

wsrep_node_address            = 172.16.5.79
wsrep_node_name               = pxc-dbnode01

wsrep_sst_method              = xtrabackup-v2
wsrep_provider                = /usr/lib/galera4/libgalera_smm.so
wsrep_slave_threads           = 8
wsrep_log_conflicts

pxc-encrypt-cluster-traffic   = off
pxc_strict_mode               = ENFORCING

default-authentication-plugin = mysql_native_password

log-slave-updates
log_bin                       = /var/lib/mysql/mysql-bin.log
binlog_expire_logs_seconds    = 604800
binlog_format                 = ROW
max_binlog_size               = 100M
innodb_autoinc_lock_mode      = 2

user                          = mysql
datadir                       = /var/lib/mysql
lc-messages-dir	              = /usr/share/mysql
socket                        = /var/run/mysqld/mysqld.sock
pid-file                      = /var/run/mysqld/mysqld.pid
log-error                     = /var/log/mysql/error.log

slow_query_log                = 1
long_query_time               = 1
slow_query_log_file           = /var/log/mysql/slow.log

skip-external-locking
wait_timeout                  = 30
max_allowed_packet            = 16M
thread_stack                  = 192K
thread_cache_size             = 8

pxc-dbnode02

[client]
port		= 3306
socket		= /var/run/mysqld/mysqld.sock

[mysqld]

server-id                     = 72

wsrep_cluster_address         = gcomm://172.16.5.79,172.16.5.72
wsrep_cluster_name            = test-cluster

wsrep_node_address            = 172.16.5.72
wsrep_node_name               = pxc-dbnode02

wsrep_sst_method              = xtrabackup-v2
wsrep_provider                = /usr/lib/galera4/libgalera_smm.so
wsrep_slave_threads           = 8
wsrep_log_conflicts

pxc-encrypt-cluster-traffic   = off
pxc_strict_mode               = ENFORCING

default-authentication-plugin = mysql_native_password

log-slave-updates
log_bin                       = /var/lib/mysql/mysql-bin.log
binlog_expire_logs_seconds    = 604800
binlog_format                 = ROW
max_binlog_size               = 100M
innodb_autoinc_lock_mode      = 2

user                          = mysql
datadir                       = /var/lib/mysql
lc-messages-dir	              = /usr/share/mysql
socket                        = /var/run/mysqld/mysqld.sock
pid-file                      = /var/run/mysqld/mysqld.pid
log-error                     = /var/log/mysql/error.log

slow_query_log                = 1
long_query_time               = 1
slow_query_log_file           = /var/log/mysql/slow.log

skip-external-locking
wait_timeout                  = 30
max_allowed_packet            = 16M
thread_stack                  = 192K
thread_cache_size             = 8
1 Like

Hi, there is a bug related to having a dot (.) in the log-bin argument. Please replace this line:
log_bin = /var/lib/mysql/mysql-bin.log
with
log_bin = /var/lib/mysql/mysql-bin

and try again

1 Like

Hello @Ivan_Groenewold , thank you for checking in and I’ll give that a shot right now. I honestly thought I had tried that earlier just to see if there was an issue with the name format. Will report back.

1 Like

@Ivan_Groenewold - No joy on that.

Just to be safe and consistent, we started with the ExistingDB and set it’s configuration to log_bin = /var/lib/mysql/binlog. Then we rebuilt all the other nodes from stratch using the configurations above but with log_bin = /var/lib/mysql/binlog.

We get to the same place. The files exist on the node trying to join the cluster:

-rw-r-----  1 mysql mysql  156 Oct  8 11:08  binlog.000003
-rw-r-----  1 mysql mysql   23 Oct  8 11:08  binlog.index

But the process is dying at the same place:

	2021-10-08T15:08:14.282001Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 8f41c9b6-2849-11ec-a843-005056913d5c.
	mysqld: File '/var/lib/mysql/.000003' not found (OS errno 2 - No such file or directory)
	2021-10-08T15:08:14.282514Z 0 [ERROR] [MY-010958] [Server] Could not open log file.

Any other thoughts? Or can you point me to any documentation of that bug you mention?

1 Like

Hi, sorry to hear you are still having problems. This is the bug I thought you were hitting: [PXC-3599] SST fails with gtid=on: Move file mysqld-bin.index to /otherfs/mysqld-bin.log.index failed: Destination file exists - Percona JIRA
I found [PXC-3396] SST failing because of wrong binlog name - Percona JIRA also that might apply to you. Can you try:
log_bin = binlog
without the full path?

2 Likes

Hey @Ivan_Groenewold, I think you edited to add that workaround just about the same time I found it on the JIRA. You pointed me in the right direction and yes that workaround works. Thanks so much. Very happy to report that is up and running right now. Hope this post can help surface this info for other people who bump into the same thing.

Cheers.

1 Like

glad to hear it’s sorted out!

1 Like