Problem initializing XtraDB Cluster 5.6.32-25.17.1 and 5.7.14-26.17.1

We are experiencing a problem with bringing up new XtraDB Cluster installations in both the latest 5.6 and 5.7 releases. The problem looks like this on PXC 5.7:
[LIST]
[]When bootstrapped from a newly created data directory, the first node does not properly enter PRIMARY state.
[
]Because it’s not primary, you can’t execute any statements that change data.
[]You can’t force the node primary by setting wsrep_provider_options=‘pc.bootstrap=true’, because you can’t do anything until you set a permanent superuser password to replace the temporary password created by MySQL 5.7.
[
]You can’t set a valid superuser password because with the node non-primary, you can’t execute any statements. So you can’t even look at global status to see what the state actually is.
[/LIST]
So on 5.7.14, you’re stuck in a catch-22 situation. You can’t set the node primary without first setting a new superuser password, and you can’t set a new superuser password without setting the node primary.

On 5.6.32, the problem seems to be related — you still end up with your bootstrapped node non-primary, though you don’t have a forced temporary password to reset. You can execute the command to set the node primary, but it doesn’t work:

mysql> set global wsrep_provider_options=‘pc.bootstrap=true’;
ERROR 1210 (HY000): Incorrect arguments to SET

Investigating this further, it seems that this is because the pc.bootstrap parameter no longer exists:

2016-11-23T21:47:42.232304Z 2 [Warning] WSREP: Unknown parameter ‘pc.bootstrap’
2016-11-23T21:47:42.232418Z 2 [ERROR] WSREP: Set options returned 7

I can find no documentation stating what has replaced it.

It looks as though the following MAY be why the node is not properly coming up primary:

2016-11-23T15:43:15.983090Z 0 [ERROR] Incorrect definition of table performance_schema.replication_connection_status: expected column ‘RECEIVED_TRANSACTION_SET’ at position 7 to have type longtext, found type text.
2016-11-23T15:43:15.984024Z 0 [ERROR] Incorrect definition of table performance_schema.replication_group_member_stats: expected column ‘COUNT_TRANSACTIONS_ROWS_VALIDATING’ at position 6, found ‘COUNT_TRANSACTIONS_VALIDATING’.

It’s not clear to me what can be done by the user to fix this.

Can anyone assist?

Hunting reason behind why node is unable to start (or enter PRIMARY) state is core issue.
That would solve both the problem you are facing (5.6 and 5.7 issue).

a. Is this a fresh installation or you just upgraded (for 5.7 case)

b. Can you share the complete log file for 5.6 and 5.7. I can take a look to see if there is anything wrong. Also, just ensure there are no other infrastructure issue given that both 5.6 and 5.7 are facing problem.

Thank you Krunal. With further study we’ve found the following:
— PXC 5.6.32 and 5.7.14 clusters updated from 5.6.28 or earlier seem to work fine
— PXC 5.6.30, 5.6.32, 5.7.14 new installs using my.cnf wsrep configuration copied from the working clusters cannot form a cluster because no node can enter PRIMARY state

I’m trying to attach log filed but the upload function says the logs are invalid files and won’t let me attach them. I’ll have to paste inline, and I’m going to have to do it one per reply because of the 10K-char comment limit.

This log is from a new PXC 5.6.32 install with a brand new clean data directory populated using mysql_install_db:

161129 20:07:03 mysqld_safe Starting mysqld daemon with databases from /data/mysql
161129 20:07:03 mysqld_safe Skipping wsrep-recover for empty datadir: /data/mysql
161129 20:07:03 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
2016-11-29 20:07:04 0 [Note] /usr/sbin/mysqld (mysqld 5.6.32-78.1-56) starting as process 14339 ...
2016-11-29 20:07:04 14339 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-11-29 20:07:04 14339 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
2016-11-29 20:07:04 14339 [Note] WSREP: wsrep_load(): Galera 3.17(r447d194) by Codership Oy <info&#64;codership.com> loaded successfully.
2016-11-29 20:07:04 14339 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2016-11-29 20:07:04 14339 [Warning] WSREP: Could not open state file for reading: '/data/mysql//grastate.dat'
2016-11-29 20:07:04 14339 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-11-29 20:07:04 14339 [Note] WSREP: Passing config to GCS: base_dir = /data/mysql/; base_host = 10.20.10.16; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 5G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 128; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.20.10.16; gmcast.segment = 0; gmcast.version = 0; ist.recv_addr =
2016-11-29 20:07:04 14339 [Note] WSREP: Service thread queue flushed.
2016-11-29 20:07:04 14339 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-11-29 20:07:04 14339 [Note] Plugin 'FEDERATED' is disabled.
2016-11-29 20:07:04 14339 [Note] InnoDB: Using atomics to ref count buffer pool pages
2016-11-29 20:07:04 14339 [Note] InnoDB: The InnoDB memory heap is disabled
2016-11-29 20:07:04 14339 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-11-29 20:07:04 14339 [Note] InnoDB: Memory barrier is not used
2016-11-29 20:07:04 14339 [Note] InnoDB: Compressed tables use zlib 1.2.3
2016-11-29 20:07:04 14339 [Note] InnoDB: Using Linux native AIO
2016-11-29 20:07:04 14339 [Note] InnoDB: Not using CPU crc32 instructions
2016-11-29 20:07:04 14339 [Note] InnoDB: Initializing buffer pool, size = 4.0G
2016-11-29 20:07:04 14339 [Note] InnoDB: Completed initialization of buffer pool
2016-11-29 20:07:04 14339 [Note] InnoDB: Highest supported file format is Barracuda.
2016-11-29 20:07:04 14339 [Note] InnoDB: 128 rollback segment(s) are active.
2016-11-29 20:07:04 14339 [Note] InnoDB: Waiting for purge to start
2016-11-29 20:07:04 14339 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.32-78.1 started; log sequence number 1625997
2016-11-29 20:07:04 14339 [Note] RSA private key file not found: /data/mysql//private_key.pem. Some authentication plugins will not work.
2016-11-29 20:07:04 14339 [Note] RSA public key file not found: /data/mysql//public_key.pem. Some authentication plugins will not work.
2016-11-29 20:07:04 14339 [Note] Server hostname (bind-address): '*'; port: 3306
2016-11-29 20:07:04 14339 [Note] IPv6 is available.
2016-11-29 20:07:04 14339 [Note] - '::' resolves to '::';
2016-11-29 20:07:04 14339 [Note] Server socket created on IP: '::'.
2016-11-29 20:07:04 14339 [Note] Event Scheduler: Loaded 0 events
2016-11-29 20:07:04 14339 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.32-78.1-56' socket: '/data/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel78.1, Revision 979409a, WSREP version 25.17, wsrep_25.17

This next log is from a clean PXC 5.7.14 install allowed to initialize its own data directory upon bootstrap:

2016-11-29T20:16:34.891939Z 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2016-11-29T20:16:34.892028Z 0 [Warning] WSREP: Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks
100
100
100
100
2016-11-29T20:16:41.686667Z 0 [Warning] InnoDB: New log files created, LSN=45790
2016-11-29T20:16:41.991630Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2016-11-29T20:16:42.046520Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: bdee098a-b670-11e6-949c-18a9053d3bf4.
2016-11-29T20:16:42.063087Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2016-11-29T20:16:42.632124Z 0 [Warning] CA certificate ca.pem is self signed.
2016-11-29T20:16:42.693380Z 1 [Note] A temporary password is generated for root&#64;localhost: Ze1a_wQ9%Fy,
2016-11-29T20:16:53.099353Z mysqld_safe Starting mysqld daemon with databases from /data/mysql
2016-11-29T20:16:53.107278Z mysqld_safe Skipping wsrep-recover for empty datadir: /data/mysql
2016-11-29T20:16:53.109622Z mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
2016-11-29T20:16:53.362056Z 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2016-11-29T20:16:53.362132Z 0 [Warning] 'NO_AUTO_CREATE_USER' sql mode was not set.
2016-11-29T20:16:53.364126Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.14-8-57) starting as process 30434 ...
2016-11-29T20:16:53.367508Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-11-29T20:16:53.367523Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
2016-11-29T20:16:53.372842Z 0 [Note] WSREP: wsrep_load(): Galera 3.17(r447d194) by Codership Oy <info&#64;codership.com> loaded successfully.
2016-11-29T20:16:53.372910Z 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2016-11-29T20:16:53.373260Z 0 [Warning] WSREP: Could not open state file for reading: '/data/mysql//grastate.dat'
2016-11-29T20:16:53.373323Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-11-29T20:16:53.374702Z 0 [Note] WSREP: Passing config to GCS: base_dir = /data/mysql/; base_host = 10.20.10.17; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 5G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 128; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.20.10.17; gmcast.segment = 0; gmcast.version = 0; ist.recv_addr =
2016-11-29T20:16:53.392154Z 0 [Note] WSREP: Service thread queue flushed.
2016-11-29T20:16:53.392232Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-11-29T20:16:53.394322Z 0 [Note] InnoDB: PUNCH HOLE support available
2016-11-29T20:16:53.394344Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-11-29T20:16:53.394354Z 0 [Note] InnoDB: Uses event mutexes
2016-11-29T20:16:53.394363Z 0 [Note] InnoDB: GCC builtin __sync_synchronize() is used for memory barrier
2016-11-29T20:16:53.394372Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
2016-11-29T20:16:53.394382Z 0 [Note] InnoDB: Using Linux native AIO
2016-11-29T20:16:53.394738Z 0 [Note] InnoDB: Number of pools: 1
2016-11-29T20:16:53.394877Z 0 [Note] InnoDB: Not using CPU crc32 instructions
2016-11-29T20:16:53.397024Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 4, chunk size = 128M
2016-11-29T20:16:53.590496Z 0 [Note] InnoDB: Completed initialization of buffer pool
2016-11-29T20:16:53.663167Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2016-11-29T20:16:53.665472Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /data/mysql/xb_doublewrite
2016-11-29T20:16:53.667106Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2016-11-29T20:16:54.027535Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /data/mysql/xb_doublewrite, size 15728640 bytes
2016-11-29T20:16:54.143416Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2016-11-29T20:16:54.143571Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2016-11-29T20:16:54.327794Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2016-11-29T20:16:54.329814Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2016-11-29T20:16:54.339821Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2016-11-29T20:16:54.340428Z 0 [Note] InnoDB: Waiting for purge to start
2016-11-29T20:16:54.390611Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.14-8 started; log sequence number 2491825
2016-11-29T20:16:54.390991Z 0 [Note] Plugin 'FEDERATED' is disabled.
2016-11-29T20:16:54.398116Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2016-11-29T20:16:54.398160Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2016-11-29T20:16:54.410828Z 0 [Warning] CA certificate ca.pem is self signed.
2016-11-29T20:16:54.410899Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2016-11-29T20:16:54.411048Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2016-11-29T20:16:54.411161Z 0 [Note] IPv6 is available.
2016-11-29T20:16:54.411177Z 0 [Note] - '::' resolves to '::';
2016-11-29T20:16:54.411221Z 0 [Note] Server socket created on IP: '::'.
2016-11-29T20:16:54.442001Z 0 [Note] Event Scheduler: Loaded 0 events
2016-11-29T20:16:54.442265Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.14-8-57' socket: '/data/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel8, Revision a3f9d06, WSREP version 26.17, wsrep_26.17

I also have a log from a new 5.6.32 install with a data directory pre-populated via rsync from a standalone Percona Server 5.6.29. It doesn’t really offer any additional information but I can provide it as well if needed.

Both of the 5.6.32 instances are in INIT state and cannot be made to leave it. Trying to force them primary via “set global wsrep_provider_options=‘pc.bootstrap=true’” results in an error in the mysql client “ERROR 1210 (HY000): Incorrect arguments to SET” and an error in the log file:
2016-11-29 20:21:05 14339 [Warning] WSREP: Unknown parameter ‘pc.bootstrap’

The 5.7.14 instance will not allow any commands to be executed until you update the temporary superuser password using ALTER USER, but ALTER USER cannot be executed because the node is in INIT state and will not allow any tables to be changed.

Oh, and I forgot to add … we are unaware of any infrastructure problem. All of the servers on which we are experiencing problems run Percona Server 5.6.34 or 5.7.16 with no problems whatsoever.

We found the problem. It was a non-obvious configuration error. All of the correct values and variables were present … but somehow the values of wsrep_cluster_address and wsrep_sst_receive_address were swapped, and the error got propagated throughout all the other new nodes.

It is at times like this that I really wish for a robust configuration checker.

Good to know you found the problem.

Configuration checker may help but in limited fashion as the format looks quite similar except gcomm:// prefix. But yes, we will consider it in due-course.