Joiner nodes failing to create pid file and dies

I have created a new cluster, bootstrapped the first node. The node comes up. When I start node 2 it fails. I checked to see if the pid file had been created and it doesn’t create a pid file. Does the pid file get created after SST completes.

2 Likes

Hi @danarashad
Could you please provide the log file with the issue?

1 Like

Donor error.log It gets connected and sends data to other node. Then just fails with broken pipe.There is no netwoking issues going one i monitored the ports and it just fails after 25-40 minutes. While the sync’ing occurs the pid file is not created.

2021-12-21T20:56:36.331824Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') connection established to 7c6a995b-85ca tcp://192.168.4.71:4567
2021-12-21T20:56:36.332472Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2021-12-21T20:56:36.832750Z 0 [Note] [MY-000000] [Galera] declaring 7c6a995b-85ca at tcp://192.168.4.71:4567 stable
2021-12-21T20:56:36.833098Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T20:56:36.949752Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,7c6a995b-85ca,2)
memb {
	7c6a995b-85ca,0
	aea03d45-a657,0
	}
joined {
	}
left {
	}
partitioned {
	}
)
2021-12-21T20:56:36.949816Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T20:56:36.955865Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2021-12-21T20:56:36.955931Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
2021-12-21T20:56:37.332909Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.333248Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 0 (DBDEV)
2021-12-21T20:56:37.333299Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 1 (DB403)
2021-12-21T20:56:37.333327Z 0 [Note] [MY-000000] [Galera] Quorum results:
	version    = 6,
	component  = PRIMARY,
	conf_id    = 1,
	members    = 1/2 (primary/total),
	act_id     = 156,
	last_appl. = 155,
	protocols  = 2/10/4 (gcs/repl/appl),
	vote policy= 0,
	group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.333423Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2021-12-21T20:56:37.333582Z 2 [Note] [MY-000000] [Galera] ####### processing CC 157, local, ordered
2021-12-21T20:56:37.333649Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from 156 upto current CC event 157 upto:156
2021-12-21T20:56:37.333677Z 2 [Note] [MY-000000] [Galera] Drain monitors from 156 up to 156
2021-12-21T20:56:37.333706Z 2 [Note] [MY-000000] [Galera] ####### My UUID: aea03d45-629d-11ec-a657-d25199295426
2021-12-21T20:56:37.333730Z 2 [Note] [MY-000000] [Galera] Skipping cert index reset
2021-12-21T20:56:37.333754Z 2 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5)
2021-12-21T20:56:37.333779Z 2 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 156 -> 157
2021-12-21T20:56:37.333849Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T20:56:37.344183Z 2 [Note] [MY-000000] [Galera] ================================================
View:
  id: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 1
  members(2):
	0: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb, DBDEV
	1: aea03d45-629d-11ec-a657-d25199295426, DB403
=================================================
2021-12-21T20:56:37.344253Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.363002Z 2 [Note] [MY-000000] [Galera] Recording CC from group: 157
2021-12-21T20:56:37.363062Z 2 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 157
2021-12-21T20:56:37.363087Z 2 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 1
2021-12-21T20:56:37.880495Z 0 [Note] [MY-000000] [Galera] Member 0.0 (DBDEV) requested state transfer from 'DB403'. Selected 1.0 (DB403)(SYNCED) as donor.
2021-12-21T20:56:37.880555Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> DONOR/DESYNCED (TO: 157)
2021-12-21T20:56:37.880611Z 2 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 132, req: STRv1
2021-12-21T20:56:37.880703Z 2 [Note] [MY-000000] [Galera] Cert index preload: 157 -> 157
2021-12-21T20:56:37.881401Z 2 [Note] [MY-000000] [WSREP] Server status change synced -> donor
2021-12-21T20:56:37.881449Z 0 [Note] [MY-000000] [Galera] async IST sender starting to serve tcp://192.168.4.71:4568 sending 157-157, preload starts from 157
2021-12-21T20:56:37.881480Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.881778Z 0 [Note] [MY-000000] [Galera] IST sender 157 -> 157
2021-12-21T20:56:37.881976Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1'   '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157' )
2021-12-21T20:56:37.914565Z 2 [Note] [MY-000000] [WSREP] DONOR thread signaled with 0
2021-12-21T20:56:39.801913Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-12-21T20:56:48.396235Z 0 [Note] [MY-000000] [WSREP-SST] Streaming the backup to joiner at 192.168.4.71 4444
2021-12-21T20:56:48.568314Z 1161 [Warning] [MY-013712] [Server] No suitable 'keyring_component_metadata_query' service implementation found to fulfill the request.
2021-12-21T21:11:12.218512Z 0 [Note] [MY-000000] [WSREP-SST] 2021/12/21 21:11:12 socat[261581] E write(7, 0x5605d4e26570, 8192): Connection reset by peer
2021-12-21T21:11:12.390175Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 
2021-12-21T21:11:12.390242Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup finished with error: 1.  Check /db1//innobackup.backup.log
2021-12-21T21:11:12.390287Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1948
2021-12-21T21:11:12.392099Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (START) ------------
	xtrabackup: recognized server arguments: --server-id=403 --tmpdir=/db3/tmp --datadir=/db1 --innodb_buffer_pool_size=20G --log_bin=binlog --defaults_group=mysqld 
	xtrabackup: recognized client arguments: --socket=/var/run/mysqld/mysqld.sock --parallel=4 --user=mysql.pxc.sst.user --password=* --socket=/var/run/mysqld/mysqld.sock --lock-ddl=1 --backup=1 --galera-info=1 --stream=xbstream --xtrabackup-plugin-dir=/usr/bin/pxc_extra/pxb-8.0/lib/plugin --target-dir=/db3/tmp/pxc_sst_GodV/donor_xb_D6ts 
	/usr/bin/pxc_extra/pxb-8.0/bin/xtrabackup version 8.0.25-17 based on MySQL server 8.0.25 Linux (x86_64) (revision id: d27028b)
	211221 20:56:48  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'mysql.pxc.sst.user'  (using password: YES).
	211221 20:56:48  version_check Connected to MySQL server
	211221 20:56:48  version_check Executing a version check against the server...
	211221 20:56:48  version_check Done.
	211221 20:56:48 Connecting to MySQL server host: localhost, user: mysql.pxc.sst.user, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
	Using server version 8.0.25-15.1
	211221 20:56:48 Executing LOCK TABLES FOR BACKUP...
	xtrabackup: uses posix_fadvise().
	xtrabackup: cd to /db1
	xtrabackup: open files limit requested 0, set to 16364
	xtrabackup: using the following InnoDB configuration:
	xtrabackup:   innodb_data_home_dir = .
	xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
	xtrabackup:   innodb_log_group_home_dir = ./
	xtrabackup:   innodb_log_files_in_group = 2
	xtrabackup:   innodb_log_file_size = 50331648
	Number of pools: 1
	xtrabackup: inititialize_service_handles suceeded
	211221 20:56:48 Connecting to MySQL server host: localhost, user: mysql.pxc.sst.user, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
	xtrabackup: Redo Log Archiving is not set up.
	WARNING: unknown option --binlog-info=ON
	211221 20:56:48 >> log scanned up to (10790819610221)
	xtrabackup: Generating a list of tablespaces
	xtrabackup: Generating a list of tablespaces
	Scanning './'
	Completed space ID check of 2 files.
	Allocated tablespace ID 2481 for anon_reporting/users, old maximum was 0
	Using undo tablespace './undo_001'.
	Using undo tablespace './undo_002'.
	Opened 2 existing undo tablespaces.
	xtrabackup: Starting 4 threads for parallel data files transfer
	211221 20:56:49 [01] Streaming ./mwf/pollVotes.ibd
	211221 20:56:49 [02] Streaming ./socon/planning.ibd
	211221 20:56:49 [03] Streaming ./webstf/quadsphere#p#p0#sp#p0sp7.ibd
	211221 20:56:49 [04] Streaming ./webstf/fts_0000000002c45a01_being_deleted_cache.ibd
	211221 20:56:49 [01]        ...done
	211221 20:56:49 [04]        ...done
	211221 20:56:49 [01] Streaming ./wstre/cyan_products.ibd
	211221 20:56:49 [01]        ...done
	211221 20:56:49 [04] Streaming ./document_browser/sponsor.ibd
	211221 20:56:49 [04]        ...done
	211221 20:56:49 [01] Streaming ./a_docs/keywords_new.ibd
	211221 20:56:49 [04] Streaming ./rf/val_old.ibd
	211221 20:56:49 [04]        ...done
	211221 20:56:49 [04] Streaming ./dst/publish_oceancolor_apps.ibd
	211221 20:56:49 [04]        ...done
	211221 20:56:49 [04] Streaming ./ibdata1
	211221 20:56:49 [02]        ...done
	211221 20:56:49 >> log scanned up to (10790819610221)
	211221 20:56:49 [02] Streaming ./rf/val_lookup.ibd
	211221 20:56:49 [02]        ...done
	211221 20:56:50 [02] Streaming ./socon/target_request_tracking.ibd
	211221 20:56:50 [02]        ...done
	211221 20:56:50 [02] Streaming ./webstf/l_resolutions.ibd
	211221 20:56:50 [02]        ...done
	211221 20:56:50 [02] Streaming ./webstf/fts_0000000002c45a01_config.ibd
	211221 20:56:50 [02]        ...done
	211221 20:56:50 >> log scanned up to (10790819610221)
	211221 20:56:51 [02] Streaming ./webstf/quadsphere#p#p1#sp#p1sp0.ibd
	211221 20:56:51 >> log scanned up to (10790819610221)
	211221 20:56:52 >> log scanned up to (10790819610221)
	211221 20:56:53 >> log scanned up to (10790819610221)
	211221 20:56:54 >> log scanned up to (10790819610221)
	211221 20:56:55 >> log scanned up to (10790819610221)
	211221 20:56:56 >> log scanned up to (10790819610221)
	211221 20:56:56 [01]        ...done
	211221 20:56:57 >> log scanned up to (10790819610221)
	211221 20:56:57 [01] Streaming ./mwf/polls.ibd
	211221 20:56:57 [01]        ...done
	211221 20:56:57 [01] Streaming ./mola/sst_s_data.ibd
	211221 20:56:58 >> log scanned up to (10790819610221)
2021-12-21T21:11:12.392228Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	211221 20:58:00 >> log scanned up to (10790819610221)
	.......
	211221 21:00:23 >> log scanned up to (10790819610521)
2021-12-21T21:11:12.392275Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	211221 21:00:24 >> log scanned up to (10790819610521)
	211221 21:02:13 >> log scanned up to (10790819610521)
	211221 21:02:13 [02]        ...done
	211221 21:02:23 >> log scanned up to (10790819610521)
	211221 21:02:23 [02] Streaming ./webstf/quadsphere#p#p1#sp#p1sp1.ibd
	211221 21:02:24 >> log scanned up to (10790819610521)
	...
2021-12-21T21:11:12.392316Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	...
	211221 21:03:49 >> log scanned up to (10790819610521)
	211221 21:03:50 [03]        ...done
	211221 21:03:52 >> log scanned up to (10790819610521)
	211221 21:03:53 [03] Streaming ./mola/sst_s_data_stats.ibd
	211221 21:03:53 >> log scanned up to (10790819610521)
	...
2021-12-21T21:11:12.392355Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	211221 21:04:55 >> log scanned up to (10790819610521)
	...
2021-12-21T21:11:12.392394Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	...
2021-12-21T21:11:12.392431Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
	211221 21:09:43 >> log scanned up to (10790819610521)
	...
	211221 21:11:11 >> log scanned up to (10790819610521)
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	xb_stream_write_data() failed.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
	[01] xtrabackup: Error: failed to copy datafile.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	xb_stream_write_data() failed.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	[04] xtrabackup: Error: xtrabackup_copy_datafile() failed.
	[04] xtrabackup: Error: failed to copy datafile.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	xb_stream_write_data() failed.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	xb_stream_write_data() failed.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	[02] xtrabackup: Error: xtrabackup_copy_datafile() failed.
	[02] xtrabackup: Error: failed to copy datafile.
	xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
	[03] xtrabackup: Error: xtrabackup_copy_datafile() failed.
	[03] xtrabackup: Error: failed to copy datafile.

2021-12-21T21:11:12.392450Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (END) ------------
2021-12-21T21:11:12.392511Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 
2021-12-21T21:11:12.392836Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:22
2021-12-21T21:11:12.418216Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1'   '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157' : 22 (Invalid argument)
2021-12-21T21:11:12.426236Z 0 [Note] [MY-000000] [Galera] SST sending failed: -22
2021-12-21T21:11:12.426285Z 0 [Note] [MY-000000] [WSREP] Server status change donor -> joined
2021-12-21T21:11:12.426318Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:12.426383Z 0 [ERROR] [MY-000000] [WSREP] Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1'   '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157' 
2021-12-21T21:11:12.426855Z 0 [Warning] [MY-000000] [Galera] 1.0 (DB403): State transfer to 0.0 (DBDEV) failed: -22 (Invalid argument)
2021-12-21T21:11:12.426917Z 0 [Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 157)
2021-12-21T21:11:12.427243Z 0 [Note] [MY-000000] [Galera] Member 1.0 (DB403) synced with group.
2021-12-21T21:11:12.427278Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 157)
2021-12-21T21:11:12.427316Z 12 [Note] [MY-000000] [Galera] Server DB403 synced with group
2021-12-21T21:11:12.427373Z 12 [Note] [MY-000000] [WSREP] Server status change joined -> synced
2021-12-21T21:11:12.427390Z 12 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections
2021-12-21T21:11:12.427430Z 12 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:13.427784Z 0 [Note] [MY-000000] [Galera] forgetting 7c6a995b-85ca (tcp://192.168.4.71:4567)
2021-12-21T21:11:13.427879Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T21:11:13.427941Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,aea03d45-a657,3)
memb {
	aea03d45-a657,0
	}
joined {
	}
left {
	}
partitioned {
	7c6a995b-85ca,0
	}
)
2021-12-21T21:11:13.427975Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T21:11:13.434284Z 0 [Note] [MY-000000] [Galera] forgetting 7c6a995b-85ca (tcp://192.168.4.71:4567)
2021-12-21T21:11:13.434342Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2021-12-21T21:11:13.434499Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: 873636ec-62a2-11ec-8053-e2e19efefe38
2021-12-21T21:11:13.434548Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 873636ec-62a2-11ec-8053-e2e19efefe38
2021-12-21T21:11:13.434590Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 873636ec-62a2-11ec-8053-e2e19efefe38 from 0 (DB403)
2021-12-21T21:11:13.434620Z 0 [Note] [MY-000000] [Galera] Quorum results:
	version    = 6,
	component  = PRIMARY,
	conf_id    = 2,
	members    = 1/1 (primary/total),
	act_id     = 157,
	last_appl. = 155,
	protocols  = 2/10/4 (gcs/repl/appl),
	vote policy= 0,
	group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T21:11:13.434787Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
2021-12-21T21:11:13.434969Z 14 [Note] [MY-000000] [Galera] ####### processing CC 158, local, ordered
2021-12-21T21:11:13.435030Z 14 [Warning] [MY-000000] [Galera] Peer (IST receiver) 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb for IST AsyncSender seems to be disconnected. Terminating IST AsyncSender.

2021-12-21T21:11:13.435180Z 0 [Note] [MY-000000] [Galera] async IST sender served
2021-12-21T21:11:13.435342Z 14 [Note] [MY-000000] [Galera] Maybe drain monitors from 157 upto current CC event 158 upto:157
2021-12-21T21:11:13.435382Z 14 [Note] [MY-000000] [Galera] Drain monitors from 157 up to 157
2021-12-21T21:11:13.435410Z 14 [Note] [MY-000000] [Galera] ####### My UUID: aea03d45-629d-11ec-a657-d25199295426
2021-12-21T21:11:13.435434Z 14 [Note] [MY-000000] [Galera] Skipping cert index reset
2021-12-21T21:11:13.435457Z 14 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5)
2021-12-21T21:11:13.435481Z 14 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 157 -> 158
2021-12-21T21:11:13.435561Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T21:11:13.445520Z 14 [Note] [MY-000000] [Galera] ================================================
View:
  id: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:158
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
	0: aea03d45-629d-11ec-a657-d25199295426, DB403
=================================================
2021-12-21T21:11:13.445656Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:13.464623Z 14 [Note] [MY-000000] [Galera] Recording CC from group: 158
2021-12-21T21:11:13.464683Z 14 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 158
2021-12-21T21:11:13.464707Z 14 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 1
2021-12-21T21:11:18.460928Z 0 [Note] [MY-000000] [Galera]  cleaning up 7c6a995b-85ca (tcp://192.168.4.71:4567)

Here is the joiner log. Data starts exchanging then just fails.

2021-12-21T20:54:26.339199Z 0 [Warning] [MY-000000] [WSREP] Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks
2021-12-21T20:54:26.339846Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.25-15.1) initializing of server in progress as process 407574
2021-12-21T20:54:26.342616Z 0 [Note] [MY-000000] [Galera] Loading provider none initial position: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:54:26.342730Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library 'none'
2021-12-21T20:54:26.573953Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-12-21T20:54:42.899335Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-12-21T20:55:19.033503Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: %PpeS_N2,)sw
2021-12-21T20:55:37.861836Z 6 [Note] [MY-000000] [WSREP] MDL conflict db=sys table=sys_config ticket=1 solved by abort
Log of wsrep recovery (--wsrep-recover):
 INFO: WSREP: Running position recovery with --log_error='/db1/wsrep_recovery_verbose.MC2rKA' --pid-file='/db1/oceandbdev401-recover.pid'
 INFO: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.220423Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.25-15.1) starting as process 407789
2021-12-21T20:56:36.222793Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib/galera4/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.222893Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib/galera4/libgalera_smm.so'
2021-12-21T20:56:36.224808Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.7(8c375ca) by Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>) loaded successfully.
2021-12-21T20:56:36.224919Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.
2021-12-21T20:56:36.225354Z 0 [Warning] [MY-000000] [Galera] Could not open state file for reading: '/db1//grastate.dat'
2021-12-21T20:56:36.225409Z 0 [Warning] [MY-000000] [Galera] No persistent state found. Bootstraping with default state
2021-12-21T20:56:36.225491Z 0 [Note] [MY-000000] [Galera] Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-12-21T20:56:36.225743Z 0 [Note] [MY-000000] [Galera] GCache DEBUG: opened preamble:
Version: 0
UUID: 00000000-0000-0000-0000-000000000000
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-12-21T20:56:36.225804Z 0 [Note] [MY-000000] [Galera] Skipped GCache ring buffer recovery: could not determine history UUID.
2021-12-21T20:56:36.315811Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: base_dir = /db1/; base_host = 192.168.4.71; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /db1/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 1G; gcache.recover = yes; gcache.size = 3G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; 
2021-12-21T20:56:36.327844Z 0 [Note] [MY-000000] [WSREP] Starting replication
2021-12-21T20:56:36.327955Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
2021-12-21T20:56:36.328037Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.328184Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
2021-12-21T20:56:36.328494Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
2021-12-21T20:56:36.328594Z 0 [Note] [MY-000000] [Galera] backend: asio
2021-12-21T20:56:36.328796Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0 
2021-12-21T20:56:36.329050Z 0 [Warning] [MY-000000] [Galera] Fail to access the file (/db1//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2021-12-21T20:56:36.329155Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2021-12-21T20:56:36.329499Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2021-12-21T20:56:36.329756Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-12-21T20:56:36.329890Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-12-21T20:56:36.330300Z 0 [Note] [MY-000000] [Galera] EVS version 1
2021-12-21T20:56:36.330559Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'WebDB-cluster', peer '192.168.2.61:'
2021-12-21T20:56:36.332184Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') connection established to aea03d45-a657 tcp://192.168.2.61:4567
2021-12-21T20:56:36.332410Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2021-12-21T20:56:36.832046Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2021-12-21T20:56:36.832224Z 0 [Note] [MY-000000] [Galera] declaring aea03d45-a657 at tcp://192.168.2.61:4567 stable
2021-12-21T20:56:36.832402Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2021-12-21T20:56:36.833237Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T20:56:36.833652Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,7c6a995b-85ca,2)
memb {
    7c6a995b-85ca,0
    aea03d45-a657,0
    }
joined {
    }
left {
    }
partitioned {
    }
)
2021-12-21T20:56:36.833829Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T20:56:37.331366Z 0 [Note] [MY-000000] [Galera] gcomm: connected
2021-12-21T20:56:37.331571Z 0 [Note] [MY-000000] [Galera] Changing maximum packet size to 64500, resulting msg size: 32636
2021-12-21T20:56:37.331784Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> OPEN (TO: 0)
2021-12-21T20:56:37.331975Z 0 [Note] [MY-000000] [Galera] Opened channel 'WebDB-cluster'
2021-12-21T20:56:37.332225Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2021-12-21T20:56:37.332466Z 2 [Note] [MY-000000] [WSREP] Starting rollbacker thread 2
2021-12-21T20:56:37.332646Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.332888Z 1 [Note] [MY-000000] [WSREP] Starting applier thread 1
2021-12-21T20:56:37.333112Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.333297Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 0 (DBDEV)
2021-12-21T20:56:37.333503Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 1 (DB403)
2021-12-21T20:56:37.333658Z 0 [Note] [MY-000000] [Galera] Quorum results:
    version    = 6,
    component  = PRIMARY,
    conf_id    = 1,
    members    = 1/2 (primary/total),
    act_id     = 156,
    last_appl. = 155,
    protocols  = 2/10/4 (gcs/repl/appl),
    vote policy= 0,
    group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.333904Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2021-12-21T20:56:37.334065Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 157)
2021-12-21T20:56:37.334311Z 1 [Note] [MY-000000] [Galera] ####### processing CC 157, local, ordered
2021-12-21T20:56:37.334541Z 1 [Note] [MY-000000] [Galera] Maybe drain monitors from -1 upto current CC event 157 upto:-1
2021-12-21T20:56:37.334741Z 1 [Note] [MY-000000] [Galera] Drain monitors from -1 up to -1
2021-12-21T20:56:37.334944Z 1 [Note] [MY-000000] [Galera] Process first view: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33 my uuid: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.335176Z 1 [Note] [MY-000000] [Galera] Server DBDEV connected to cluster at position bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157 with ID 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.335376Z 1 [Note] [MY-000000] [WSREP] Server status change disconnected -> connected
2021-12-21T20:56:37.335632Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.335889Z 1 [Note] [MY-000000] [Galera] ####### My UUID: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.336124Z 1 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2021-12-21T20:56:37.336407Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T20:56:37.336675Z 1 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2021-12-21T20:56:37.336939Z 1 [Note] [MY-000000] [Galera] State transfer required: 
    Group state: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
    Local state: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:37.337182Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
2021-12-21T20:56:37.337392Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.337746Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '407789' --mysqld-version '8.0.25-15.1'   '' )
2021-12-21T20:56:37.877746Z 1 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|192.168.4.71:4444/xtrabackup_sst//1
2021-12-21T20:56:37.878122Z 1 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
2021-12-21T20:56:37.878458Z 1 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.878831Z 1 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 157, STRv: 3
2021-12-21T20:56:37.879234Z 1 [Note] [MY-000000] [Galera] IST receiver addr using tcp://192.168.4.71:4568
2021-12-21T20:56:37.879710Z 1 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-157, listening at: tcp://192.168.4.71:4568
2021-12-21T20:56:37.880486Z 0 [Note] [MY-000000] [Galera] Member 0.0 (DBDEV) requested state transfer from 'DB403'. Selected 1.0 (DB403)(SYNCED) as donor.
2021-12-21T20:56:37.880739Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 157)
2021-12-21T20:56:37.881128Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 1
2021-12-21T20:56:37.881533Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2021-12-21T20:56:37.881835Z 1 [Note] [MY-000000] [Galera] GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
2021-12-21T20:56:38.670343Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2021-12-21T20:56:38.692808Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2021-12-21T20:56:39.831734Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-12-21T21:11:12.212590Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (408260) with SIGKILL after stalling for 120 seconds
2021-12-21T21:11:12.427229Z 0 [Warning] [MY-000000] [Galera] 1.0 (DB403): State transfer to 0.0 (DBDEV) failed: -22 (Invalid argument)
2021-12-21T21:11:12.427303Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1214: Will never receive state. Need to abort.
2021-12-21T21:11:12.427333Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-12-21T21:11:12.427385Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-12-21T21:11:12.427528Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2021-12-21T21:11:12.494647Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 185: 408262 Killed                  socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
2021-12-21T21:11:12.494698Z 0 [Note] [MY-000000] [WSREP-SST]      408263                       | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2021-12-21T21:11:12.494872Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 
2021-12-21T21:11:12.494957Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 137 137
2021-12-21T21:11:12.494994Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1296
2021-12-21T21:11:12.495214Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 
2021-12-21T21:11:12.495706Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2021-12-21T21:11:12.692837Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '407789' --mysqld-version '8.0.25-15.1'   '' : 32 (Broken pipe)
2021-12-21T21:11:12.692944Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2021-12-21T21:11:12.692986Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2021-12-21T21:11:12.693270Z 3 [Note] [MY-000000] [Galera] Processing SST received
2021-12-21T21:11:12.693356Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2021-12-21T21:11:12.693426Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2021-12-21T21:11:12.693487Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2021-12-21T21:11:12.693545Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2021-12-21T21:11:12.693591Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2021-12-21T21:11:13.434758Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,7c6a995b-85ca,2)
memb {
    7c6a995b-85ca,0
    }
joined {
    }
left {
    }
partitioned {
    aea03d45-a657,0
    }
)
2021-12-21T21:11:13.434845Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2021-12-21T21:11:13.434865Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2021-12-21T21:11:13.435129Z 0 [Note] [MY-000000] [Galera] gcomm: closed
2021-12-21T21:11:13.435170Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-12-21T21:11:13.435229Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-12-21T21:11:13.435285Z 0 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
2021-12-21T21:11:13.435401Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2021-12-21T21:11:13.435443Z 3 [Note] [MY-000000] [Galera] gcomm: backend closed already
2021-12-21T21:11:13.435512Z 0 [Note] [MY-000000] [WSREP] Terminating SST process
2021-12-21T21:11:13.435558Z 3 [Note] [MY-000000] [Galera] gcomm: closed


1 Like

This smells like the systemd timeout of 15 min. You can try increasing it from the unit file.

1 Like

In mysql.service I set
TimeoutStartSec=infinity

I’ve also used this
TimeoutSec=14440

Both yield the same result.

1 Like

The joiner killed itself after 120s of receiving no data. Ensure that ports 4567, 4568, and 4444 are open between both nodes.

2021-12-21T21:11:12.390242Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup finished with error: 1. Check /db1//innobackup.backup.log

What’s the contents of this log?

1 Like

The ports are open. I get a response.

node1: socat - TCP-LISTEN:4444
node2: echo "hello" | socat - TCP:ip.adr.of.node1:4444

``
innobackup.backup.log included.
innobackup_backup_log.txt (51.2 KB)

1 Like

How do I overwrite service_startup_timeout=900

I checked the vi /usr/bin/mysql-systemd file and the value is set to 900 how can I overwrite that variable.

1 Like

You can edit the file directly. Also might want to check the alternative method listed here: [PXC-2080] SST Timeout After 2 hours 30 minutes On Systemd Implementation - Percona JIRA

1 Like

This seems to be for mysql version 5.6, I am using 8.0 which comes with Percona XtraDB Cluster 8.0. I have set service_startup_timeout=14400 in the [mysqld_safe] block of my.cnf. Still getting a failure. Edited /usr/bin/mysql-systemd and changed the service_startup_timeout to 14400 and time out still occurs.

2 Likes

Did you ever find a fix for this? I’m having the exact same problem after upgrading to 8.027-18.1 from 5.7.

The joiner updates with SST and reaches the same state (same UUID), but it doesn’t return for a long time and then eventually times out/shuts down. I have also tried extending the timeout in both the mysql-systemd and my.cnf under [mysqld_safe].

1 Like

The solution that worked for me was commenting out the line
log-bin =

as outline here:

1 Like

I am not sure what the fix was. But it wasn’t related to bin log. I tried so much that I really don’t remember. I think it might have been a hard drive space issue.

2 Likes