I have created a new cluster, bootstrapped the first node. The node comes up. When I start node 2 it fails. I checked to see if the pid file had been created and it doesn’t create a pid file. Does the pid file get created after SST completes.
Hi @danarashad
Could you please provide the log file with the issue?
Donor error.log It gets connected and sends data to other node. Then just fails with broken pipe.There is no netwoking issues going one i monitored the ports and it just fails after 25-40 minutes. While the sync’ing occurs the pid file is not created.
2021-12-21T20:56:36.331824Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') connection established to 7c6a995b-85ca tcp://192.168.4.71:4567
2021-12-21T20:56:36.332472Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-12-21T20:56:36.832750Z 0 [Note] [MY-000000] [Galera] declaring 7c6a995b-85ca at tcp://192.168.4.71:4567 stable
2021-12-21T20:56:36.833098Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T20:56:36.949752Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,7c6a995b-85ca,2)
memb {
7c6a995b-85ca,0
aea03d45-a657,0
}
joined {
}
left {
}
partitioned {
}
)
2021-12-21T20:56:36.949816Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T20:56:36.955865Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2021-12-21T20:56:36.955931Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
2021-12-21T20:56:37.332909Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.333248Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 0 (DBDEV)
2021-12-21T20:56:37.333299Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 1 (DB403)
2021-12-21T20:56:37.333327Z 0 [Note] [MY-000000] [Galera] Quorum results:
version = 6,
component = PRIMARY,
conf_id = 1,
members = 1/2 (primary/total),
act_id = 156,
last_appl. = 155,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.333423Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2021-12-21T20:56:37.333582Z 2 [Note] [MY-000000] [Galera] ####### processing CC 157, local, ordered
2021-12-21T20:56:37.333649Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from 156 upto current CC event 157 upto:156
2021-12-21T20:56:37.333677Z 2 [Note] [MY-000000] [Galera] Drain monitors from 156 up to 156
2021-12-21T20:56:37.333706Z 2 [Note] [MY-000000] [Galera] ####### My UUID: aea03d45-629d-11ec-a657-d25199295426
2021-12-21T20:56:37.333730Z 2 [Note] [MY-000000] [Galera] Skipping cert index reset
2021-12-21T20:56:37.333754Z 2 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5)
2021-12-21T20:56:37.333779Z 2 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 156 -> 157
2021-12-21T20:56:37.333849Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T20:56:37.344183Z 2 [Note] [MY-000000] [Galera] ================================================
View:
id: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
status: primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 1
members(2):
0: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb, DBDEV
1: aea03d45-629d-11ec-a657-d25199295426, DB403
=================================================
2021-12-21T20:56:37.344253Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.363002Z 2 [Note] [MY-000000] [Galera] Recording CC from group: 157
2021-12-21T20:56:37.363062Z 2 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 157
2021-12-21T20:56:37.363087Z 2 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 1
2021-12-21T20:56:37.880495Z 0 [Note] [MY-000000] [Galera] Member 0.0 (DBDEV) requested state transfer from 'DB403'. Selected 1.0 (DB403)(SYNCED) as donor.
2021-12-21T20:56:37.880555Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> DONOR/DESYNCED (TO: 157)
2021-12-21T20:56:37.880611Z 2 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 132, req: STRv1
2021-12-21T20:56:37.880703Z 2 [Note] [MY-000000] [Galera] Cert index preload: 157 -> 157
2021-12-21T20:56:37.881401Z 2 [Note] [MY-000000] [WSREP] Server status change synced -> donor
2021-12-21T20:56:37.881449Z 0 [Note] [MY-000000] [Galera] async IST sender starting to serve tcp://192.168.4.71:4568 sending 157-157, preload starts from 157
2021-12-21T20:56:37.881480Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.881778Z 0 [Note] [MY-000000] [Galera] IST sender 157 -> 157
2021-12-21T20:56:37.881976Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157' )
2021-12-21T20:56:37.914565Z 2 [Note] [MY-000000] [WSREP] DONOR thread signaled with 0
2021-12-21T20:56:39.801913Z 0 [Note] [MY-000000] [Galera] (aea03d45-a657, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-12-21T20:56:48.396235Z 0 [Note] [MY-000000] [WSREP-SST] Streaming the backup to joiner at 192.168.4.71 4444
2021-12-21T20:56:48.568314Z 1161 [Warning] [MY-013712] [Server] No suitable 'keyring_component_metadata_query' service implementation found to fulfill the request.
2021-12-21T21:11:12.218512Z 0 [Note] [MY-000000] [WSREP-SST] 2021/12/21 21:11:12 socat[261581] E write(7, 0x5605d4e26570, 8192): Connection reset by peer
2021-12-21T21:11:12.390175Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2021-12-21T21:11:12.390242Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup finished with error: 1. Check /db1//innobackup.backup.log
2021-12-21T21:11:12.390287Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1948
2021-12-21T21:11:12.392099Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (START) ------------
xtrabackup: recognized server arguments: --server-id=403 --tmpdir=/db3/tmp --datadir=/db1 --innodb_buffer_pool_size=20G --log_bin=binlog --defaults_group=mysqld
xtrabackup: recognized client arguments: --socket=/var/run/mysqld/mysqld.sock --parallel=4 --user=mysql.pxc.sst.user --password=* --socket=/var/run/mysqld/mysqld.sock --lock-ddl=1 --backup=1 --galera-info=1 --stream=xbstream --xtrabackup-plugin-dir=/usr/bin/pxc_extra/pxb-8.0/lib/plugin --target-dir=/db3/tmp/pxc_sst_GodV/donor_xb_D6ts
/usr/bin/pxc_extra/pxb-8.0/bin/xtrabackup version 8.0.25-17 based on MySQL server 8.0.25 Linux (x86_64) (revision id: d27028b)
211221 20:56:48 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'mysql.pxc.sst.user' (using password: YES).
211221 20:56:48 version_check Connected to MySQL server
211221 20:56:48 version_check Executing a version check against the server...
211221 20:56:48 version_check Done.
211221 20:56:48 Connecting to MySQL server host: localhost, user: mysql.pxc.sst.user, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
Using server version 8.0.25-15.1
211221 20:56:48 Executing LOCK TABLES FOR BACKUP...
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /db1
xtrabackup: open files limit requested 0, set to 16364
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 50331648
Number of pools: 1
xtrabackup: inititialize_service_handles suceeded
211221 20:56:48 Connecting to MySQL server host: localhost, user: mysql.pxc.sst.user, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
xtrabackup: Redo Log Archiving is not set up.
WARNING: unknown option --binlog-info=ON
211221 20:56:48 >> log scanned up to (10790819610221)
xtrabackup: Generating a list of tablespaces
xtrabackup: Generating a list of tablespaces
Scanning './'
Completed space ID check of 2 files.
Allocated tablespace ID 2481 for anon_reporting/users, old maximum was 0
Using undo tablespace './undo_001'.
Using undo tablespace './undo_002'.
Opened 2 existing undo tablespaces.
xtrabackup: Starting 4 threads for parallel data files transfer
211221 20:56:49 [01] Streaming ./mwf/pollVotes.ibd
211221 20:56:49 [02] Streaming ./socon/planning.ibd
211221 20:56:49 [03] Streaming ./webstf/quadsphere#p#p0#sp#p0sp7.ibd
211221 20:56:49 [04] Streaming ./webstf/fts_0000000002c45a01_being_deleted_cache.ibd
211221 20:56:49 [01] ...done
211221 20:56:49 [04] ...done
211221 20:56:49 [01] Streaming ./wstre/cyan_products.ibd
211221 20:56:49 [01] ...done
211221 20:56:49 [04] Streaming ./document_browser/sponsor.ibd
211221 20:56:49 [04] ...done
211221 20:56:49 [01] Streaming ./a_docs/keywords_new.ibd
211221 20:56:49 [04] Streaming ./rf/val_old.ibd
211221 20:56:49 [04] ...done
211221 20:56:49 [04] Streaming ./dst/publish_oceancolor_apps.ibd
211221 20:56:49 [04] ...done
211221 20:56:49 [04] Streaming ./ibdata1
211221 20:56:49 [02] ...done
211221 20:56:49 >> log scanned up to (10790819610221)
211221 20:56:49 [02] Streaming ./rf/val_lookup.ibd
211221 20:56:49 [02] ...done
211221 20:56:50 [02] Streaming ./socon/target_request_tracking.ibd
211221 20:56:50 [02] ...done
211221 20:56:50 [02] Streaming ./webstf/l_resolutions.ibd
211221 20:56:50 [02] ...done
211221 20:56:50 [02] Streaming ./webstf/fts_0000000002c45a01_config.ibd
211221 20:56:50 [02] ...done
211221 20:56:50 >> log scanned up to (10790819610221)
211221 20:56:51 [02] Streaming ./webstf/quadsphere#p#p1#sp#p1sp0.ibd
211221 20:56:51 >> log scanned up to (10790819610221)
211221 20:56:52 >> log scanned up to (10790819610221)
211221 20:56:53 >> log scanned up to (10790819610221)
211221 20:56:54 >> log scanned up to (10790819610221)
211221 20:56:55 >> log scanned up to (10790819610221)
211221 20:56:56 >> log scanned up to (10790819610221)
211221 20:56:56 [01] ...done
211221 20:56:57 >> log scanned up to (10790819610221)
211221 20:56:57 [01] Streaming ./mwf/polls.ibd
211221 20:56:57 [01] ...done
211221 20:56:57 [01] Streaming ./mola/sst_s_data.ibd
211221 20:56:58 >> log scanned up to (10790819610221)
2021-12-21T21:11:12.392228Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
211221 20:58:00 >> log scanned up to (10790819610221)
.......
211221 21:00:23 >> log scanned up to (10790819610521)
2021-12-21T21:11:12.392275Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
211221 21:00:24 >> log scanned up to (10790819610521)
211221 21:02:13 >> log scanned up to (10790819610521)
211221 21:02:13 [02] ...done
211221 21:02:23 >> log scanned up to (10790819610521)
211221 21:02:23 [02] Streaming ./webstf/quadsphere#p#p1#sp#p1sp1.ibd
211221 21:02:24 >> log scanned up to (10790819610521)
...
2021-12-21T21:11:12.392316Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
...
211221 21:03:49 >> log scanned up to (10790819610521)
211221 21:03:50 [03] ...done
211221 21:03:52 >> log scanned up to (10790819610521)
211221 21:03:53 [03] Streaming ./mola/sst_s_data_stats.ibd
211221 21:03:53 >> log scanned up to (10790819610521)
...
2021-12-21T21:11:12.392355Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
211221 21:04:55 >> log scanned up to (10790819610521)
...
2021-12-21T21:11:12.392394Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
...
2021-12-21T21:11:12.392431Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (cont) ------------
211221 21:09:43 >> log scanned up to (10790819610521)
...
211221 21:11:11 >> log scanned up to (10790819610521)
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[04] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[04] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[02] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[02] xtrabackup: Error: failed to copy datafile.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[03] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[03] xtrabackup: Error: failed to copy datafile.
2021-12-21T21:11:12.392450Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (END) ------------
2021-12-21T21:11:12.392511Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2021-12-21T21:11:12.392836Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:22
2021-12-21T21:11:12.418216Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157' : 22 (Invalid argument)
2021-12-21T21:11:12.426236Z 0 [Note] [MY-000000] [Galera] SST sending failed: -22
2021-12-21T21:11:12.426285Z 0 [Note] [MY-000000] [WSREP] Server status change donor -> joined
2021-12-21T21:11:12.426318Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:12.426383Z 0 [ERROR] [MY-000000] [WSREP] Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.4.71:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' '' --gtid 'bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157'
2021-12-21T21:11:12.426855Z 0 [Warning] [MY-000000] [Galera] 1.0 (DB403): State transfer to 0.0 (DBDEV) failed: -22 (Invalid argument)
2021-12-21T21:11:12.426917Z 0 [Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 157)
2021-12-21T21:11:12.427243Z 0 [Note] [MY-000000] [Galera] Member 1.0 (DB403) synced with group.
2021-12-21T21:11:12.427278Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 157)
2021-12-21T21:11:12.427316Z 12 [Note] [MY-000000] [Galera] Server DB403 synced with group
2021-12-21T21:11:12.427373Z 12 [Note] [MY-000000] [WSREP] Server status change joined -> synced
2021-12-21T21:11:12.427390Z 12 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections
2021-12-21T21:11:12.427430Z 12 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:13.427784Z 0 [Note] [MY-000000] [Galera] forgetting 7c6a995b-85ca (tcp://192.168.4.71:4567)
2021-12-21T21:11:13.427879Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T21:11:13.427941Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,aea03d45-a657,3)
memb {
aea03d45-a657,0
}
joined {
}
left {
}
partitioned {
7c6a995b-85ca,0
}
)
2021-12-21T21:11:13.427975Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T21:11:13.434284Z 0 [Note] [MY-000000] [Galera] forgetting 7c6a995b-85ca (tcp://192.168.4.71:4567)
2021-12-21T21:11:13.434342Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2021-12-21T21:11:13.434499Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: 873636ec-62a2-11ec-8053-e2e19efefe38
2021-12-21T21:11:13.434548Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 873636ec-62a2-11ec-8053-e2e19efefe38
2021-12-21T21:11:13.434590Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 873636ec-62a2-11ec-8053-e2e19efefe38 from 0 (DB403)
2021-12-21T21:11:13.434620Z 0 [Note] [MY-000000] [Galera] Quorum results:
version = 6,
component = PRIMARY,
conf_id = 2,
members = 1/1 (primary/total),
act_id = 157,
last_appl. = 155,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T21:11:13.434787Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
2021-12-21T21:11:13.434969Z 14 [Note] [MY-000000] [Galera] ####### processing CC 158, local, ordered
2021-12-21T21:11:13.435030Z 14 [Warning] [MY-000000] [Galera] Peer (IST receiver) 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb for IST AsyncSender seems to be disconnected. Terminating IST AsyncSender.
2021-12-21T21:11:13.435180Z 0 [Note] [MY-000000] [Galera] async IST sender served
2021-12-21T21:11:13.435342Z 14 [Note] [MY-000000] [Galera] Maybe drain monitors from 157 upto current CC event 158 upto:157
2021-12-21T21:11:13.435382Z 14 [Note] [MY-000000] [Galera] Drain monitors from 157 up to 157
2021-12-21T21:11:13.435410Z 14 [Note] [MY-000000] [Galera] ####### My UUID: aea03d45-629d-11ec-a657-d25199295426
2021-12-21T21:11:13.435434Z 14 [Note] [MY-000000] [Galera] Skipping cert index reset
2021-12-21T21:11:13.435457Z 14 [Note] [MY-000000] [Galera] REPL Protocols: 10 (5)
2021-12-21T21:11:13.435481Z 14 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 157 -> 158
2021-12-21T21:11:13.435561Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T21:11:13.445520Z 14 [Note] [MY-000000] [Galera] ================================================
View:
id: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:158
status: primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: aea03d45-629d-11ec-a657-d25199295426, DB403
=================================================
2021-12-21T21:11:13.445656Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T21:11:13.464623Z 14 [Note] [MY-000000] [Galera] Recording CC from group: 158
2021-12-21T21:11:13.464683Z 14 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 158
2021-12-21T21:11:13.464707Z 14 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 1
2021-12-21T21:11:18.460928Z 0 [Note] [MY-000000] [Galera] cleaning up 7c6a995b-85ca (tcp://192.168.4.71:4567)
Here is the joiner log. Data starts exchanging then just fails.
2021-12-21T20:54:26.339199Z 0 [Warning] [MY-000000] [WSREP] Node is running in bootstrap/initialize mode. Disabling pxc_strict_mode checks
2021-12-21T20:54:26.339846Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.25-15.1) initializing of server in progress as process 407574
2021-12-21T20:54:26.342616Z 0 [Note] [MY-000000] [Galera] Loading provider none initial position: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:54:26.342730Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library 'none'
2021-12-21T20:54:26.573953Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-12-21T20:54:42.899335Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-12-21T20:55:19.033503Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: %PpeS_N2,)sw
2021-12-21T20:55:37.861836Z 6 [Note] [MY-000000] [WSREP] MDL conflict db=sys table=sys_config ticket=1 solved by abort
Log of wsrep recovery (--wsrep-recover):
INFO: WSREP: Running position recovery with --log_error='/db1/wsrep_recovery_verbose.MC2rKA' --pid-file='/db1/oceandbdev401-recover.pid'
INFO: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.220423Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.25-15.1) starting as process 407789
2021-12-21T20:56:36.222793Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib/galera4/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.222893Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib/galera4/libgalera_smm.so'
2021-12-21T20:56:36.224808Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.7(8c375ca) by Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>) loaded successfully.
2021-12-21T20:56:36.224919Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.
2021-12-21T20:56:36.225354Z 0 [Warning] [MY-000000] [Galera] Could not open state file for reading: '/db1//grastate.dat'
2021-12-21T20:56:36.225409Z 0 [Warning] [MY-000000] [Galera] No persistent state found. Bootstraping with default state
2021-12-21T20:56:36.225491Z 0 [Note] [MY-000000] [Galera] Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-12-21T20:56:36.225743Z 0 [Note] [MY-000000] [Galera] GCache DEBUG: opened preamble:
Version: 0
UUID: 00000000-0000-0000-0000-000000000000
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-12-21T20:56:36.225804Z 0 [Note] [MY-000000] [Galera] Skipped GCache ring buffer recovery: could not determine history UUID.
2021-12-21T20:56:36.315811Z 0 [Note] [MY-000000] [Galera] Passing config to GCS: base_dir = /db1/; base_host = 192.168.4.71; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /db1/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 1G; gcache.recover = yes; gcache.size = 3G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto;
2021-12-21T20:56:36.327844Z 0 [Note] [MY-000000] [WSREP] Starting replication
2021-12-21T20:56:36.327955Z 0 [Note] [MY-000000] [Galera] Connecting with bootstrap option: 0
2021-12-21T20:56:36.328037Z 0 [Note] [MY-000000] [Galera] Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:36.328184Z 0 [Note] [MY-000000] [Galera] protonet asio version 0
2021-12-21T20:56:36.328494Z 0 [Note] [MY-000000] [Galera] Using CRC-32C for message checksums.
2021-12-21T20:56:36.328594Z 0 [Note] [MY-000000] [Galera] backend: asio
2021-12-21T20:56:36.328796Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0
2021-12-21T20:56:36.329050Z 0 [Warning] [MY-000000] [Galera] Fail to access the file (/db1//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2021-12-21T20:56:36.329155Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2021-12-21T20:56:36.329499Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2021-12-21T20:56:36.329756Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-12-21T20:56:36.329890Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-12-21T20:56:36.330300Z 0 [Note] [MY-000000] [Galera] EVS version 1
2021-12-21T20:56:36.330559Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'WebDB-cluster', peer '192.168.2.61:'
2021-12-21T20:56:36.332184Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') connection established to aea03d45-a657 tcp://192.168.2.61:4567
2021-12-21T20:56:36.332410Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-12-21T20:56:36.832046Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2021-12-21T20:56:36.832224Z 0 [Note] [MY-000000] [Galera] declaring aea03d45-a657 at tcp://192.168.2.61:4567 stable
2021-12-21T20:56:36.832402Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2021-12-21T20:56:36.833237Z 0 [Note] [MY-000000] [Galera] Node aea03d45-a657 state primary
2021-12-21T20:56:36.833652Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,7c6a995b-85ca,2)
memb {
7c6a995b-85ca,0
aea03d45-a657,0
}
joined {
}
left {
}
partitioned {
}
)
2021-12-21T20:56:36.833829Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2021-12-21T20:56:37.331366Z 0 [Note] [MY-000000] [Galera] gcomm: connected
2021-12-21T20:56:37.331571Z 0 [Note] [MY-000000] [Galera] Changing maximum packet size to 64500, resulting msg size: 32636
2021-12-21T20:56:37.331784Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> OPEN (TO: 0)
2021-12-21T20:56:37.331975Z 0 [Note] [MY-000000] [Galera] Opened channel 'WebDB-cluster'
2021-12-21T20:56:37.332225Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2021-12-21T20:56:37.332466Z 2 [Note] [MY-000000] [WSREP] Starting rollbacker thread 2
2021-12-21T20:56:37.332646Z 0 [Note] [MY-000000] [Galera] STATE_EXCHANGE: sent state UUID: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.332888Z 1 [Note] [MY-000000] [WSREP] Starting applier thread 1
2021-12-21T20:56:37.333112Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d
2021-12-21T20:56:37.333297Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 0 (DBDEV)
2021-12-21T20:56:37.333503Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: 7d03aa54-62a0-11ec-9d7f-db65bba8ba6d from 1 (DB403)
2021-12-21T20:56:37.333658Z 0 [Note] [MY-000000] [Galera] Quorum results:
version = 6,
component = PRIMARY,
conf_id = 1,
members = 1/2 (primary/total),
act_id = 156,
last_appl. = 155,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.333904Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2021-12-21T20:56:37.334065Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 157)
2021-12-21T20:56:37.334311Z 1 [Note] [MY-000000] [Galera] ####### processing CC 157, local, ordered
2021-12-21T20:56:37.334541Z 1 [Note] [MY-000000] [Galera] Maybe drain monitors from -1 upto current CC event 157 upto:-1
2021-12-21T20:56:37.334741Z 1 [Note] [MY-000000] [Galera] Drain monitors from -1 up to -1
2021-12-21T20:56:37.334944Z 1 [Note] [MY-000000] [Galera] Process first view: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33 my uuid: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.335176Z 1 [Note] [MY-000000] [Galera] Server DBDEV connected to cluster at position bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157 with ID 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.335376Z 1 [Note] [MY-000000] [WSREP] Server status change disconnected -> connected
2021-12-21T20:56:37.335632Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.335889Z 1 [Note] [MY-000000] [Galera] ####### My UUID: 7c6a995b-62a0-11ec-85ca-5a7b6cded7eb
2021-12-21T20:56:37.336124Z 1 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2021-12-21T20:56:37.336407Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2021-12-21T20:56:37.336675Z 1 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2021-12-21T20:56:37.336939Z 1 [Note] [MY-000000] [Galera] State transfer required:
Group state: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
Local state: 00000000-0000-0000-0000-000000000000:-1
2021-12-21T20:56:37.337182Z 1 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
2021-12-21T20:56:37.337392Z 1 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2021-12-21T20:56:37.337746Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '407789' --mysqld-version '8.0.25-15.1' '' )
2021-12-21T20:56:37.877746Z 1 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|192.168.4.71:4444/xtrabackup_sst//1
2021-12-21T20:56:37.878122Z 1 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
2021-12-21T20:56:37.878458Z 1 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33
2021-12-21T20:56:37.878831Z 1 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 157, STRv: 3
2021-12-21T20:56:37.879234Z 1 [Note] [MY-000000] [Galera] IST receiver addr using tcp://192.168.4.71:4568
2021-12-21T20:56:37.879710Z 1 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-157, listening at: tcp://192.168.4.71:4568
2021-12-21T20:56:37.880486Z 0 [Note] [MY-000000] [Galera] Member 0.0 (DBDEV) requested state transfer from 'DB403'. Selected 1.0 (DB403)(SYNCED) as donor.
2021-12-21T20:56:37.880739Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 157)
2021-12-21T20:56:37.881128Z 1 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 1
2021-12-21T20:56:37.881533Z 1 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2021-12-21T20:56:37.881835Z 1 [Note] [MY-000000] [Galera] GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> bc54efc2-52bc-11ec-9ba9-fbe9dc3b2f33:157
2021-12-21T20:56:38.670343Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2021-12-21T20:56:38.692808Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2021-12-21T20:56:39.831734Z 0 [Note] [MY-000000] [Galera] (7c6a995b-85ca, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-12-21T21:11:12.212590Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (408260) with SIGKILL after stalling for 120 seconds
2021-12-21T21:11:12.427229Z 0 [Warning] [MY-000000] [Galera] 1.0 (DB403): State transfer to 0.0 (DBDEV) failed: -22 (Invalid argument)
2021-12-21T21:11:12.427303Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1214: Will never receive state. Need to abort.
2021-12-21T21:11:12.427333Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-12-21T21:11:12.427385Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-12-21T21:11:12.427528Z 0 [Note] [MY-000000] [Galera] gcomm: closing backend
2021-12-21T21:11:12.494647Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 185: 408262 Killed socat -u TCP-LISTEN:4444,reuseaddr,retry=30 stdio
2021-12-21T21:11:12.494698Z 0 [Note] [MY-000000] [WSREP-SST] 408263 | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2021-12-21T21:11:12.494872Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2021-12-21T21:11:12.494957Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node: exit codes: 137 137
2021-12-21T21:11:12.494994Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1296
2021-12-21T21:11:12.495214Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2021-12-21T21:11:12.495706Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2021-12-21T21:11:12.692837Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.71' --datadir '/db1/' --basedir '/usr/' --plugindir '/usr/lib/mysql/plugin/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '407789' --mysqld-version '8.0.25-15.1' '' : 32 (Broken pipe)
2021-12-21T21:11:12.692944Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2021-12-21T21:11:12.692986Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2021-12-21T21:11:12.693270Z 3 [Note] [MY-000000] [Galera] Processing SST received
2021-12-21T21:11:12.693356Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2021-12-21T21:11:12.693426Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2021-12-21T21:11:12.693487Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2021-12-21T21:11:12.693545Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2021-12-21T21:11:12.693591Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2021-12-21T21:11:13.434758Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,7c6a995b-85ca,2)
memb {
7c6a995b-85ca,0
}
joined {
}
left {
}
partitioned {
aea03d45-a657,0
}
)
2021-12-21T21:11:13.434845Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2021-12-21T21:11:13.434865Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2021-12-21T21:11:13.435129Z 0 [Note] [MY-000000] [Galera] gcomm: closed
2021-12-21T21:11:13.435170Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2021-12-21T21:11:13.435229Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2021-12-21T21:11:13.435285Z 0 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
2021-12-21T21:11:13.435401Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2021-12-21T21:11:13.435443Z 3 [Note] [MY-000000] [Galera] gcomm: backend closed already
2021-12-21T21:11:13.435512Z 0 [Note] [MY-000000] [WSREP] Terminating SST process
2021-12-21T21:11:13.435558Z 3 [Note] [MY-000000] [Galera] gcomm: closed
This smells like the systemd timeout of 15 min. You can try increasing it from the unit file.
In mysql.service I set
TimeoutStartSec=infinity
I’ve also used this
TimeoutSec=14440
Both yield the same result.
The joiner killed itself after 120s of receiving no data. Ensure that ports 4567, 4568, and 4444 are open between both nodes.
2021-12-21T21:11:12.390242Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup finished with error: 1. Check /db1//innobackup.backup.log
What’s the contents of this log?
The ports are open. I get a response.
node1: socat - TCP-LISTEN:4444
node2: echo "hello" | socat - TCP:ip.adr.of.node1:4444
``
innobackup.backup.log included.
innobackup_backup_log.txt (51.2 KB)
How do I overwrite service_startup_timeout=900
I checked the vi /usr/bin/mysql-systemd
file and the value is set to 900 how can I overwrite that variable.
You can edit the file directly. Also might want to check the alternative method listed here: [PXC-2080] SST Timeout After 2 hours 30 minutes On Systemd Implementation - Percona JIRA
This seems to be for mysql version 5.6, I am using 8.0 which comes with Percona XtraDB Cluster 8.0. I have set service_startup_timeout=14400
in the [mysqld_safe]
block of my.cnf. Still getting a failure. Edited /usr/bin/mysql-systemd
and changed the service_startup_timeout to 14400 and time out still occurs.
Did you ever find a fix for this? I’m having the exact same problem after upgrading to 8.027-18.1 from 5.7.
The joiner updates with SST and reaches the same state (same UUID), but it doesn’t return for a long time and then eventually times out/shuts down. I have also tried extending the timeout in both the mysql-systemd and my.cnf under [mysqld_safe].
The solution that worked for me was commenting out the line
log-bin =
as outline here:
I am not sure what the fix was. But it wasn’t related to bin log. I tried so much that I really don’t remember. I think it might have been a hard drive space issue.