Hello,
We have been using a plain master-slave setup until now and are looking into moving to PXC.
To have the smoothest live transition possible and take 0 risks of harming our live production, we want to have the PXC cluster be an asynchronous slave of the current plain master-slave pair, and then move reads and finally writes to PXC.
My method so far to provision the slave PXC cluster has been:
- backup/restore with xtrabackup for the first PXC node, with wsrep_* commented out (ie running as a plain node)
- check that replication is all ok
- Uncomment wsrep_* and bootstrap the node
- check that replication is still all ok with wsrep_* flags uncommented
- add other PXC nodes
This has worked just fine in dev so far, but doesn’t work for our production DB with identical master/slave and PXC configurations, and I cannot diagnose the issue so far.
The problem is that my prod PXC node cannot replicate and I am not sure why. Replication starts but immediately gets stuck, without any particular error/warning in the logs. The Exec_Source_Log_Pos variable never changes…
I can then re-comment-out wsrep_* and catch up, but whenever I uncomment them and restart I’m back to square 1.
This is more or less the same issue as PXC 8.x replication issues, and if I try to force-start things similarly I run into the same errors as that user in the end…
In there @matthewb suggests:
As my last piece of advice, I would try the following:
- Erase PXC1 datadir
- Ensure that all GTID-related configuration is set on mysql1
- Take a new xtrabackup of mysql1 and restore to PXC1
- Start PXC1 in bootstrap mode
- Configure PXC1 as replica using GTID
Which I’ve tried multiple times, with either of SOURCE_LOG_{FILE, POS} and SOURCE_AUTO_POS, both without success…
In case it somehow helps, these are the relevant configurations, on master/slave’s master:
[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
datadir = /var/lib/mysql
skip-name-resolve
myisam-recover-options = BACKUP
log_error = /var/log/mysql/error.log
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_expire_logs_seconds = 864000
binlog_format = ROW
max_binlog_size = 100M
gtid_mode=ON
enforce-gtid-consistency=ON
innodb_buffer_pool_size = 24G
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO"
character-set-server=utf8mb4
wait-timeout=200
interactive-timeout=200
and on PXC1 side:
[client]
socket=/var/run/mysqld/mysqld.sock
[mysqld]
bind-address=0.0.0.0
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
datadir = /var/lib/mysql
skip-name-resolve
log_output = FILE
log_error = /var/log/mysql/error.log
server-id = 12
log_bin = /var/log/mysql/mysql-bin
binlog_expire_logs_seconds = 864000
binlog_format = ROW
log_replica_updates = 1
max_binlog_size = 100M
gtid_mode=ON
enforce-gtid-consistency=ON
relay_log=relay-bin
innodb_buffer_pool_size = 8G
innodb_log_file_size = 256M
innodb_log_files_in_group=2
innodb_autoinc_lock_mode=2
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO"
character-set-server=utf8mb4
wait-timeout=200
interactive-timeout=200
default_authentication_plugin=mysql_native_password
binlog_expire_logs_seconds=604800
######## wsrep ###############
pxc_strict_mode=ENFORCING
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://
wsrep_applier_threads=8
wsrep_log_conflicts=ON
wsrep_cluster_name=pxc-cluster
wsrep_node_name=pxc-cluster-node-1
wsrep_sst_method=skip
wsrep_debug=CLIENT
#wsrep_node_address=xxxxx
When running with wsrep_* commented out:
mysql> SHOW REPLICA STATUS \G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: $redacted
Source_User: repl
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: mysql-bin.013277
Read_Source_Log_Pos: 44033822
Relay_Log_File: relay-bin.000009
Relay_Log_Pos: 44009214
Relay_Source_Log_File: mysql-bin.013277
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 44009078
Relay_Log_Space: 44034249
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Source_Server_Id: 1
Source_UUID: ffd08e3b-5120-11ec-b049-bac207f913f9
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set: ffd08e3b-5120-11ec-b049-bac207f913f9:476689337-476750187
Executed_Gtid_Set: 0ab9f887-a52d-11eb-840e-bac207f913f9:1-141256875,
90cd250e-a4f0-11eb-9917-bac207f913f9:1-2,
ffd08e3b-5120-11ec-b049-bac207f913f9:1-476750187
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
And when running with wsrep_* uncommented:
mysql> SHOW REPLICA STATUS \G;
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: $redacted
Source_User: repl
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: mysql-bin.013277
Read_Source_Log_Pos: 68378769
Relay_Log_File: relay-bin.000003
Relay_Log_Pos: 419
Relay_Source_Log_File: mysql-bin.013277
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 51875810
Relay_Log_Space: 16503762
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: 244
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Source_Server_Id: 1
Source_UUID: ffd08e3b-5120-11ec-b049-bac207f913f9
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State: Waiting for dependent transaction to commit
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set: ffd08e3b-5120-11ec-b049-bac207f913f9:476752120-476756546
Executed_Gtid_Set: 0ab9f887-a52d-11eb-840e-bac207f913f9:1-141256875,
90cd250e-a4f0-11eb-9917-bac207f913f9:1-2,
ffd08e3b-5120-11ec-b049-bac207f913f9:1-476752121
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
ERROR:
No query specified
mysql> show processlist;
+----+-----------------+-----------------+------+---------+------+------------------------------------------------+------------------+---------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
+----+-----------------+-----------------+------+---------+------+------------------------------------------------+------------------+---------+-----------+---------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 88 | Waiting on empty queue | NULL | 87886 | 0 | 0 |
| 6 | system user | | NULL | Sleep | 88 | innobase_commit_low (-1) | NULL | 87865 | 0 | 0 |
| 7 | system user | | NULL | Sleep | 88 | wsrep aborter idle | NULL | 87865 | 0 | 0 |
| 10 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87860 | 0 | 0 |
| 11 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 12 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 13 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 14 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 15 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 16 | system user | | NULL | Sleep | 88 | wsrep: applier idle | NULL | 87859 | 0 | 0 |
| 18 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 |
| 19 | system user | connecting host | NULL | Connect | 16 | Waiting for source to send event | NULL | 16015 | 0 | 0 |
| 20 | system user | | NULL | Query | 16 | Waiting for dependent transaction to commit | NULL | 15956 | 0 | 0 |
| 21 | system user | | NULL | Query | 250 | Waiting for an event from Coordinator | NULL | 15945 | 0 | 0 |
| 22 | system user | | NULL | Query | 250 | wsrep: write set replicated and certified (20) | NULL | 15921 | 0 | 0 |
| 23 | system user | | NULL | Query | 250 | Waiting for preceding transaction to commit | NULL | 15923 | 0 | 0 |
| 24 | system user | | NULL | Connect | 16 | Waiting for an event from Coordinator | NULL | 16012 | 0 | 0 |
+----+-----------------+-----------------+------+---------+------+------------------------------------------------+------------------+---------+-----------+---------------+
17 rows in set (0.00 sec)
Any suggestions of where to look for what’s wrong would be greatly appreciated, as START REPLICA doesn’t cause more to show in the logs than:
2022-08-06T11:11:13.538547Z 19 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2022-08-06T11:11:13.539763Z 19 [System] [MY-010562] [Repl] Slave I/O thread for channel '': connected to master 'repl@$redacted:3306',replication started in log 'FIRST' at position 4