SST Succeeds but Node Unable to Join

I have an XtraDB Cluster node that is unable to join the cluster. Before this issue, it was able to join just fine via IST or SST, as needed.

Now, the SST process succeeds but the log shows an “Operation not permitted” error. Then it cancels the SST request and drops out of the cluster.

The innobackup.prepare.log file shows an error: “Error reading xtrabackup_keys: wrong magic”

So it appears the issue is at the “prepare” stage of the xtrabackup process, but unsure how to proceed.

1 Like

What version of PXC? What version of xtrabackup do you have installed?

1 Like

Oops! Did not mean to skip those details.

PXC version is 5.7.34-37
xtrabackup version is 2.4.24

1 Like

Can you provide the xtrabackup log from donor node and the other logs from joiner?

1 Like

Node 3 (joiner node) error log:

2022-12-19T22:13:54.030185Z 0 [Warning] The syntax '--language/-l' is deprecated and will be removed in a future release. Please use '--lc-messages-dir' instead.
2022-12-19T22:13:54.030302Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-12-19T22:13:54.031193Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.34-37-57-log) starting as process 100248 ...
2022-12-19T22:13:54.031216Z 0 [Warning] Using pre 5.5 semantics to load error messages from /usr/share/mysql/english/.
2022-12-19T22:13:54.031217Z 0 [Warning] If this is not intended, refer to the documentation for valid usage of --lc-messages-dir and --language parameters.
2022-12-19T22:13:54.034150Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2022-12-19T22:13:54.034157Z 0 [Note] WSREP: Skipping automatic SSL certificate generation (enabled only in bootstrap mode)
2022-12-19T22:13:54.034162Z 0 [Note] WSREP: Setting wsrep_ready to false
2022-12-19T22:13:54.034168Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2022-12-19T22:13:54.034171Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2022-12-19T22:13:54.034584Z 0 [Note] WSREP: wsrep_load(): Galera 3.51(rd992932) by Codership Oy <info@codership.com> loaded successfully.
2022-12-19T22:13:54.034616Z 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-12-19T22:13:54.034999Z 0 [Warning] WSREP: Could not open state file for reading: '/mydatadir/mydir/grastate.dat'
2022-12-19T22:13:54.035005Z 0 [Warning] WSREP: No persistent state found. Bootstraping with default state
2022-12-19T22:13:54.035033Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2022-12-19T22:13:54.035839Z 0 [Note] WSREP: Passing config to GCS: base_dir = /mydatadir/mydir; base_host = 172.31.255.51; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /mydatadir/mydir; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mydatadir/mydir/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = 1; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 9; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; socket.ssl_ca = ca.pem; socket.ssl_cert = server-cert.pem; socket.ssl_cipher = ; socket.ssl_compression = YES; socket.ssl_key = server-key.pem; 
2022-12-19T22:13:54.042446Z 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2022-12-19T22:13:54.042507Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2022-12-19T22:13:54.042516Z 0 [Note] WSREP: Preparing to initiate SST/IST
2022-12-19T22:13:54.042519Z 0 [Note] WSREP: Starting replication
2022-12-19T22:13:54.042524Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-12-19T22:13:54.042610Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-12-19T22:13:54.042623Z 0 [Note] WSREP: initializing ssl context
2022-12-19T22:13:54.042791Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2022-12-19T22:13:54.042839Z 0 [Warning] WSREP: Fail to access the file (/mydatadir/mydir/gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2022-12-19T22:13:54.042843Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2022-12-19T22:13:54.042935Z 0 [Note] WSREP: GMCast version 0
2022-12-19T22:13:54.043025Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2022-12-19T22:13:54.043030Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2022-12-19T22:13:54.043174Z 0 [Note] WSREP: EVS version 0
2022-12-19T22:13:54.043213Z 0 [Note] WSREP: gcomm: connecting to group 'pxc_group', peer '172.31.255.47:,172.31.255.49:,172.31.255.51:'
2022-12-19T22:13:54.045658Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.31.255.51:4567 local endpoint ssl://172.31.255.51:41870 cipher: TLS_AES_256_GCM_SHA384 compression: none
2022-12-19T22:13:54.045886Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.31.255.51:41870 local endpoint ssl://172.31.255.51:4567 cipher: TLS_AES_256_GCM_SHA384 compression: none
2022-12-19T22:13:54.045970Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://172.31.255.51:4567
2022-12-19T22:13:54.047044Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.31.255.47:4567 local endpoint ssl://172.31.255.51:35872 cipher: TLS_AES_256_GCM_SHA384 compression: none
2022-12-19T22:13:54.047798Z 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.31.255.49:4567 local endpoint ssl://172.31.255.51:32976 cipher: TLS_AES_256_GCM_SHA384 compression: none
2022-12-19T22:13:54.048360Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') connection established to 6d4a85ba ssl://172.31.255.47:4567
2022-12-19T22:13:54.048458Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2022-12-19T22:13:54.049299Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') connection established to 2bc89d5d ssl://172.31.255.49:4567
2022-12-19T22:13:54.544306Z 0 [Note] WSREP: declaring 2bc89d5d at ssl://172.31.255.49:4567 stable
2022-12-19T22:13:54.544340Z 0 [Note] WSREP: declaring 6d4a85ba at ssl://172.31.255.47:4567 stable
2022-12-19T22:13:54.544732Z 0 [Note] WSREP: Node 2bc89d5d state primary
2022-12-19T22:13:54.545085Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,2bc89d5d,25)
memb {
	2bc89d5d,0
	6ca8c71d,0
	6d4a85ba,0
	}
joined {
	}
left {
	}
partitioned {
	}
)
2022-12-19T22:13:54.545103Z 0 [Note] WSREP: Save the discovered primary-component to disk
2022-12-19T22:13:55.043682Z 0 [Note] WSREP: gcomm: connected
2022-12-19T22:13:55.043793Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2022-12-19T22:13:55.043925Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2022-12-19T22:13:55.043933Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2022-12-19T22:13:55.043980Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2022-12-19T22:13:55.044039Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 6cf55e09-7fea-11ed-b1e1-e6d01afe525e
2022-12-19T22:13:55.044054Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6cf55e09-7fea-11ed-b1e1-e6d01afe525e from 0 (node2)
2022-12-19T22:13:55.044065Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6cf55e09-7fea-11ed-b1e1-e6d01afe525e from 2 (node1)
2022-12-19T22:13:55.044536Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 6cf55e09-7fea-11ed-b1e1-e6d01afe525e from 1 (node3)
2022-12-19T22:13:55.044556Z 0 [Note] WSREP: Quorum results:
	version    = 6,
	component  = PRIMARY,
	conf_id    = 24,
	members    = 2/3 (primary/total),
	act_id     = 352732562,
	last_appl. = -1,
	protocols  = 0/9/3 (gcs/repl/appl),
	group UUID = 0e59dc44-61da-11ed-bb91-e7c08fe77924
2022-12-19T22:13:55.044575Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2022-12-19T22:13:55.044585Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 352732562)
2022-12-19T22:13:55.044666Z 2 [Note] WSREP: State transfer required: 
	Group state: 0e59dc44-61da-11ed-bb91-e7c08fe77924:352732562
	Local state: 00000000-0000-0000-0000-000000000000:-1
2022-12-19T22:13:55.044707Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-12-19T22:13:55.044718Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-12-19T22:13:55.044737Z 2 [Note] WSREP: New cluster view: global state: 0e59dc44-61da-11ed-bb91-e7c08fe77924:352732562, view# 25: Primary, number of nodes: 3, my index: 1, protocol version 3
2022-12-19T22:13:55.044746Z 2 [Note] WSREP: Setting wsrep_ready to true
2022-12-19T22:13:55.044754Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-12-19T22:13:55.044761Z 2 [Note] WSREP: Setting wsrep_ready to false
2022-12-19T22:13:55.044922Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.31.255.51' --datadir '/mydatadir/mydir/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '100248' --mysqld-version '5.7.34-37-57'   '' )
	2022-12-19T22:13:55.123022Z WSREP_SST: [DEBUG] The xtrabackup version is 2.4.24
	2022-12-19T22:13:55.255050Z WSREP_SST: [WARNING] WARNING: xtrabackup-plugin-dir (under xtrabackup section) missing
	2022-12-19T22:13:55.257202Z WSREP_SST: [WARNING] xtrabackup installation will lookout for plugin at default location
	2022-12-19T22:13:55.466412Z WSREP_SST: [DEBUG] pxc_encrypt_cluster_traffic is enabled, using PXC auto-ssl configuration
	2022-12-19T22:13:55.468959Z WSREP_SST: [DEBUG] with encrypt=4  ssl_ca=/mydatadir/mydir//ca.pem  ssl_cert=/mydatadir/mydir//server-cert.pem  ssl_key=/mydatadir/mydir//server-key.pem
	2022-12-19T22:13:55.481692Z WSREP_SST: [INFO] Streaming with xbstream
	2022-12-19T22:13:55.484777Z WSREP_SST: [DEBUG] Using socat as streamer
	2022-12-19T22:13:55.506775Z WSREP_SST: [DEBUG] Using openssl based encryption with socat: with key, crt, and ca
	2022-12-19T22:13:55.534244Z WSREP_SST: [DEBUG] ssl_ca (absolute) : /mydatadir/mydir/ca.pem
	2022-12-19T22:13:55.537587Z WSREP_SST: [DEBUG] ssl_cert (absolute) : /mydatadir/mydir/server-cert.pem
	2022-12-19T22:13:55.540930Z WSREP_SST: [DEBUG] ssl_key (absolute) : /mydatadir/mydir/server-key.pem
	2022-12-19T22:13:55.569897Z WSREP_SST: [DEBUG] Decrypting with CERT: /mydatadir/mydir/server-cert.pem, KEY: /mydatadir/mydir/server-key.pem, CA: /mydatadir/mydir/ca.pem
	2022-12-19T22:13:55.615668Z WSREP_SST: [DEBUG] 1208: Using /proc/pid/net/tcp for socat/nc discovery
	2022-12-19T22:13:55.624285Z WSREP_SST: [DEBUG] 1239: local_address index is 2
	2022-12-19T22:13:55.629926Z WSREP_SST: [DEBUG] 1252: Entering loop body : 1
	2022-12-19T22:13:55.642379Z WSREP_SST: [DEBUG] Evaluating socat -u openssl-listen:4444,reuseaddr,cert=/mydatadir/mydir/server-cert.pem,key=/mydatadir/mydir/server-key.pem,cafile=/mydatadir/mydir/ca.pem,verify=1,retry=30 stdio | xbstream -x  $xbstream_eopts
	2022-12-19T22:13:55.848248Z WSREP_SST: [DEBUG] 1252: Entering loop body : 2
	2022-12-19T22:13:55.865157Z WSREP_SST: [DEBUG] 1261: Examining pid: 100770
	2022-12-19T22:13:56.012796Z WSREP_SST: [DEBUG] 1278: sockets: 2684663|2684664|2684665
	2022-12-19T22:13:56.020886Z WSREP_SST: [DEBUG] 1302: found a match for pid: 100770
	2022-12-19T22:13:56.022888Z WSREP_SST: [DEBUG] 1318: wait_for_listen() exiting
2022-12-19T22:13:56.023116Z 2 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|172.31.255.51:4444/xtrabackup_sst//1
2022-12-19T22:13:56.023135Z 2 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 2) (Increment: 1 -> 3)
2022-12-19T22:13:56.023160Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-12-19T22:13:56.023183Z 2 [Note] WSREP: Assign initial position for certification: 352732562, protocol version: 4
2022-12-19T22:13:56.023252Z 0 [Note] WSREP: Service thread queue flushed.
2022-12-19T22:13:56.023298Z 2 [Note] WSREP: Check if state gap can be serviced using IST
2022-12-19T22:13:56.023309Z 2 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 0e59dc44-61da-11ed-bb91-e7c08fe77924
2022-12-19T22:13:56.023343Z 2 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2022-12-19T22:13:56.023349Z 2 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (0e59dc44-61da-11ed-bb91-e7c08fe77924): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():516. IST will be unavailable.
2022-12-19T22:13:56.023849Z 0 [Note] WSREP: Member 1.0 (node3) requested state transfer from 'node2'. Selected 0.0 (node2)(SYNCED) as donor.
2022-12-19T22:13:56.023867Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 352732641)
2022-12-19T22:13:56.023884Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-12-19T22:13:56.023895Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 0e59dc44-61da-11ed-bb91-e7c08fe77924:352732562
2022-12-19T22:13:57.543888Z 0 [Note] WSREP: (6ca8c71d, 'ssl://0.0.0.0:4567') turning message relay requesting off
	2022-12-19T22:13:57.706190Z WSREP_SST: [INFO] Streaming with xbstream
	2022-12-19T22:13:57.715904Z WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /mydatadir/mydir//.sst from previous state transfer. Removing
	2022-12-19T22:13:57.900422Z WSREP_SST: [INFO] Proceeding with SST.........
	2022-12-19T22:13:57.901073Z WSREP_SST: [DEBUG] Evaluating (@ Joiner-Recv-SST) socat -u openssl-listen:4444,reuseaddr,cert=/mydatadir/mydir/server-cert.pem,key=/mydatadir/mydir/server-key.pem,cafile=/mydatadir/mydir/ca.pem,verify=1,retry=30 stdio | xbstream -x  $xbstream_eopts; RC=( ${PIPESTATUS[@]} )
	2022-12-19T22:13:57.901810Z WSREP_SST: [DEBUG] Cleaning the existing datadir and innodb-data/log directories
	2022-12-19T22:13:57.928909Z WSREP_SST: [DEBUG] Cleaning the binlog directory /mydatadir/mydir_binlogs as well
	2022-12-19T22:13:57.937896Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2022-12-19T22:32:39.636350Z 0 [Note] WSREP: Created page /mydatadir/mydir/gcache.page.000000 of size 134217728 bytes
2022-12-19T22:50:35.489007Z 0 [Note] WSREP: Created page /mydatadir/mydir/gcache.page.000001 of size 134217728 bytes
2022-12-19T23:07:23.544303Z 0 [Note] WSREP: Created page /mydatadir/mydir/gcache.page.000002 of size 134217728 bytes
2022-12-19T23:15:20.068000Z 0 [Note] WSREP: 0.0 (node2): State transfer to 1.0 (node3) complete.
2022-12-19T23:15:20.068487Z 0 [Note] WSREP: Member 0.0 (node2) synced with group.
	2022-12-19T23:15:20.104065Z WSREP_SST: [INFO] Preparing the backup at /mydatadir/mydir//.sst
	2022-12-19T23:15:20.107611Z WSREP_SST: [DEBUG] Evaluating (@ Xtrabackup prepare stage) xtrabackup   --use-memory=200G --prepare --binlog-info=ON $rebuildcmd $keyringapplyopt $encrypt_prepare_options --target-dir=${DATA} &>${DATA}/innobackup.prepare.log
	2022-12-19T23:15:20.158038Z WSREP_SST: [ERROR] Cleanup after exit with status:1
2022-12-19T23:15:21.828998Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.31.255.51' --datadir '/mydatadir/mydir/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '100248' --mysqld-version '5.7.34-37-57'   '' : 1 (Operation not permitted)
2022-12-19T23:15:21.829048Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2022-12-19T23:15:21.829060Z 0 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
2022-12-19T23:15:21.829137Z 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2022-12-19T23:15:21.829171Z 0 [ERROR] Aborting

2022-12-19T23:15:21.829182Z 0 [Note] WSREP: Signalling cancellation of the SST request.
2022-12-19T23:15:21.829202Z 0 [Note] WSREP: SST request was cancelled
2022-12-19T23:15:21.829224Z 0 [Note] Giving 2 client threads a chance to die gracefully
2022-12-19T23:15:21.829278Z 2 [Note] WSREP: Closing send monitor...
2022-12-19T23:15:21.829313Z 2 [Note] WSREP: Closed send monitor.
2022-12-19T23:15:21.829333Z 2 [Note] WSREP: gcomm: terminating thread
2022-12-19T23:15:21.829355Z 2 [Note] WSREP: gcomm: joining thread
2022-12-19T23:15:21.829446Z 2 [Note] WSREP: gcomm: closing backend
2022-12-19T23:15:21.830032Z 2 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,2bc89d5d,25)
memb {
	6ca8c71d,0
	}
joined {
	}
left {
	}
partitioned {
	2bc89d5d,0
	6d4a85ba,0
	}
)
2022-12-19T23:15:21.830085Z 2 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2022-12-19T23:15:21.830416Z 2 [Note] WSREP: gcomm: closed
2022-12-19T23:15:21.830443Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2022-12-19T23:15:21.830464Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2022-12-19T23:15:21.830472Z 0 [Note] WSREP: Received NON-PRIMARY.
2022-12-19T23:15:21.830482Z 0 [Note] WSREP: Shifting JOINER -> OPEN (TO: 353085465)
2022-12-19T23:15:21.830494Z 0 [Note] WSREP: Received self-leave message.
2022-12-19T23:15:21.830505Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2022-12-19T23:15:21.830513Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2022-12-19T23:15:21.830521Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 353085465)
2022-12-19T23:15:21.830544Z 0 [Note] WSREP: RECV thread exiting 0: Success
2022-12-19T23:15:21.830619Z 2 [Note] WSREP: recv_thread() joined.
2022-12-19T23:15:21.830629Z 2 [Note] WSREP: Closing replication queue.
2022-12-19T23:15:21.830637Z 2 [Note] WSREP: Closing slave action queue.
2022-12-19T23:15:21.830669Z 2 [Note] WSREP: Ignorng trx(352732563) due to SST failure
2022-12-19T23:15:21.830684Z 2 [Note] WSREP: Ignorng trx(352732564) due to SST failure
2022-12-19T23:15:21.830696Z 2 [Note] WSREP: Ignorng trx(352732565) due to SST failure
2022-12-19T23:15:21.830708Z 2 [Note] WSREP: Ignorng trx(352732566) due to SST failure

Joiner innobackup.prepare.log:

xtrabackup: recognized server arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksum_algorithm=strict_crc32 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=1073741824 --innodb_fast_checksum=0 --innodb_page_size=16384 --innodb_log_block_size=512 --innodb_undo_directory=./ --innodb_undo_tablespaces=0 --server-id=11 --redo-log-version=1 --transition-key=* 
xtrabackup: recognized client arguments: --use-memory=200G --prepare=1 --binlog-info=ON --target-dir=/mydatadir/mydir//.sst 
xtrabackup version 2.4.24 based on MySQL server 5.7.35 Linux (x86_64) (revision id: b4ee263)
xtrabackup: cd to /mydatadir/mydir/.sst/
xtrabackup: This target seems to be not prepared yet.
InnoDB: Number of pools: 1
xtrabackup: xtrabackup_logfile detected: size=762707968, start_lsn=(36169016808685)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup:   innodb_log_group_home_dir = .
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 762707968
221219 17:15:20 Loading xtrabackup_keys.
221219 17:15:20 Error reading xtrabackup_keys: wrong magic.
xtrabackup: Error: failed to load tablespace keys

Donor backup log:

xtrabackup: recognized server arguments: --datadir=/mydatadir/mydir --tmpdir=/xxx --server-id=11 --open_files_limit=65536 --innodb_buffer_pool_size=200G --innodb_log_file_size=1G --innodb_log_buffer_size=4M --innodb_flush_method=O_DIRECT --innodb_flush_log_at_trx_commit=1 --innodb_file_per_table=1 --log_bin=/mydatadir/mydir_binlogs/mysql-bin.log --open_files_limit=10000 --defaults_group=mysqld --parallel=4 --transition-key=* 
xtrabackup: recognized client arguments: --user=sstuser --password=* --socket=/var/run/mysqld/mysqld.sock --lock-ddl=1 --backup=1 --galera-info=1 --binlog-info=ON --stream=xbstream --target-dir=/xxx/pxc_sst_0T6K/donor_xb_dR9n 
221219 16:14:06  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'sstuser'  (using password: YES).
221219 16:14:06  version_check Connected to MySQL server
221219 16:14:06  version_check Executing a version check against the server...
221219 16:14:06  version_check Done.
221219 16:14:06 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: not set, socket: /var/run/mysqld/mysqld.sock
Using server version 5.7.34-37-57-log
221219 16:14:06 Executing LOCK TABLES FOR BACKUP...
xtrabackup version 2.4.26 based on MySQL server 5.7.35 Linux (x86_64) (revision id: 19de43b)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /mydatadir/mydir
xtrabackup: open files limit requested 10000, set to 65536
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = .
xtrabackup:   innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 1073741824
xtrabackup: using O_DIRECT
InnoDB: Number of pools: 1
221219 16:14:06 Added plugin 'keyring_vault.so' to load list.
Plugin keyring_vault reported: 'Probing mysecretmountpoint for being a mount point unsuccessful - skipped.'
221219 16:14:12 >> log scanned up to (36169346957242)
xtrabackup: Generating a list of tablespaces
InnoDB: Allocated tablespace ID 140694 for SaCon/Connection, old maximum was 0
xtrabackup: Starting 4 threads for parallel data files transfer
221219 16:14:12 [02] Streaming ./ibdata1
221219 17:15:19 [01] Streaming ./aaa/bbb.frm to <STDOUT>
221219 17:15:19 [01]        ...done
221219 17:15:19 Finished backing up non-InnoDB tables and files
221219 17:15:19 Executing LOCK BINLOG FOR BACKUP...
221219 17:15:19 [00] Streaming <STDOUT>
221219 17:15:19 [00]        ...done
221219 17:15:19 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
xtrabackup: The latest check point (for incremental): '36169383582611'
xtrabackup: Stopping log copying thread.
.221219 17:15:19 >> log scanned up to (36169694758791)

221219 17:15:19 Executing UNLOCK BINLOG
221219 17:15:19 Executing UNLOCK TABLES
221219 17:15:19 All tables unlocked
221219 17:15:19 [00] Streaming ib_buffer_pool to <STDOUT>
221219 17:15:19 [00]        ...done
221219 17:15:19 Backup created in directory '/xxx/pxc_sst_0T6K/donor_xb_dR9n/'
MySQL binlog position: filename 'mysql-bin.006512', position '457086702'
221219 17:15:19 [00] Streaming <STDOUT>
221219 17:15:19 [00]        ...done
221219 17:15:19 [00] Streaming <STDOUT>
221219 17:15:19 [00]        ...done
221219 17:15:19 Saving xtrabackup_keys.
xtrabackup: Transaction log of lsn (36169016808685) to (36169694758791) was copied.
Shutting down plugin 'keyring_vault'
221219 17:15:20 completed OK!

1 Like

You’re using Hashicorp Vault to store keys? In the error log, it indicates this plugin is failing to grab whatever it needs and thus cannot create the encrypted snapshot.

221219 16:14:06 Added plugin 'keyring_vault.so' to load list.
Plugin keyring_vault reported: 'Probing mysecretmountpoint for being a 
mount point unsuccessful - skipped.'
1 Like

If the node reporting this has no issues reaching the vault, and the vault shows successful requests in the audit log, might there be any other place to check to see if vault configuration is correct?

Service can be stopped and started on this donor node with no issue, and the plugin shows that it is working properly. Encrypted tables are accessible, but it continues to report the issue you are pointing out.

1 Like

I feel like I have isolated the issue, and it does appear to be vault/encryption related. However, like I said above, I have verified successful requests to the vault by the donor node. I have stopped the service, disabled the plugin, and verified that it does not work without these items. After putting them back in, the MySQL service on the donor interacts with the vault with no issues and encrypted tables are visible and accessible.

The issue is in the SST/xtrabackup process when it tries to probe the mount point. It continues to be unsuccessful there.

On the joiner, I am consistently seeing the following:

2022-12-22T20:23:31.954460Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 9bdbaa7e-2485-11ed-8de0-af3e10dda87c:4730
	2022-12-22T20:23:32.718871Z WSREP_SST: [INFO] Streaming with xbstream
	2022-12-22T20:23:32.726866Z WSREP_SST: [INFO] Proceeding with SST.........
	2022-12-22T20:23:32.727582Z WSREP_SST: [DEBUG] Evaluating (@ Joiner-Recv-SST) socat -u openssl-listen:4444,reuseaddr,cert=/mydatadir/mydir/server-cert.pem,key=/mydatadir/mydir/server-key.pem,cafile=/mydatadir/mydir/ca.pem,verify=1,retry=30 stdio | xbstream -x  $xbstream_eopts; RC=( ${PIPESTATUS[@]} )
	2022-12-22T20:23:32.727834Z WSREP_SST: [DEBUG] Cleaning the existing datadir and innodb-data/log directories
	2022-12-22T20:23:32.736997Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2022-12-22T20:23:33.820278Z 0 [Note] WSREP: (7fd9f5c6, 'ssl://0.0.0.0:4567') turning message relay requesting off
2022-12-22T20:24:03.254661Z 0 [Note] WSREP: 0.0 (node5): State transfer to 2.0 (node6) complete.
2022-12-22T20:24:03.255123Z 0 [Note] WSREP: Member 0.0 (node5) synced with group.
2022/12/22 14:24:05 socat[229580] E SSL_read(): Connection reset by peer
xb_stream_read_chunk(): my_read() failed.
	2022-12-22T20:24:05.755751Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
	2022-12-22T20:24:05.756781Z WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 1 1
	2022-12-22T20:24:05.757791Z WSREP_SST: [ERROR] ****************************************************** 
	2022-12-22T20:24:05.759063Z WSREP_SST: [ERROR] Cleanup after exit with status:32
2022-12-22T20:24:05.794001Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.31.255.99' --datadir '/mydatadir/mydir/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '228923' --mysqld-version '5.7.38-41-57'   '' : 32 (Broken pipe)
2022-12-22T20:24:05.794031Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2022-12-22T20:24:05.794036Z 0 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
2022-12-22T20:24:05.794070Z 0 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2022-12-22T20:24:05.794075Z 0 [ERROR] Aborting

2022-12-22T20:24:05.794078Z 0 [Note] WSREP: Signalling cancellation of the SST request.
2022-12-22T20:24:05.794093Z 0 [Note] WSREP: SST request was cancelled
2022-12-22T20:24:05.794103Z 0 [Note] Giving 2 client threads a chance to die gracefully
2022-12-22T20:24:05.794133Z 2 [Note] WSREP: Closing send monitor...
2022-12-22T20:24:05.794137Z 2 [Note] WSREP: Closed send monitor.
2022-12-22T20:24:05.794143Z 2 [Note] WSREP: gcomm: terminating thread
2022-12-22T20:24:05.794152Z 2 [Note] WSREP: gcomm: joining thread
2022-12-22T20:24:05.794210Z 2 [Note] WSREP: gcomm: closing backend
1 Like

What do the Vault logs say when SST process begins? Maybe there is some indication of what SST is trying to do and is actually getting denied.

1 Like

Here is the Vault log from the last SST attempt. As far as I can tell, these requests are successful, but maybe I am missing something?

{"time":"2022-12-23T15:36:54.4756119Z","type":"request","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"b0fceada-4a8c-3cc1-840f-2ecdd0080aeb","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"read","mount_type":"kv","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/config","remote_address":"172.31.255.98","remote_port":46988}}
{"time":"2022-12-23T15:36:54.4756119Z","type":"response","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"b0fceada-4a8c-3cc1-840f-2ecdd0080aeb","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"read","mount_type":"kv","mount_accessor":"kv_e7eeae45","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/config","remote_address":"172.31.255.98","remote_port":46988},"response":{"mount_type":"kv","mount_accessor":"kv_e7eeae45"}}
{"time":"2022-12-23T15:36:54.4826149Z","type":"request","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"2804b5d5-aedd-82f2-cc92-63bebf3c7abd","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"list","mount_type":"kv","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/","remote_address":"172.31.255.98","remote_port":46990}}
{"time":"2022-12-23T15:36:54.4836156Z","type":"response","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"2804b5d5-aedd-82f2-cc92-63bebf3c7abd","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"list","mount_type":"kv","mount_accessor":"kv_e7eeae45","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/","remote_address":"172.31.255.98","remote_port":46990},"response":{"mount_type":"kv","mount_accessor":"kv_e7eeae45","data":{"keys":["hmac-sha256:a9c...","hmac-sha256:7e4..."]}}}
{"time":"2022-12-23T15:36:54.753526Z","type":"request","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"92907fb1-1ecb-fa60-f542-2467694b5104","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"read","mount_type":"kv","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/NDhfSU5OT0RCS2V5LTU2YjgxNjVhLTI4N2ItMTFlZC1hZTEzLTAwMTU1ZDBjZDQwZS0xMF8=","remote_address":"172.31.255.98","remote_port":46992}}
{"time":"2022-12-23T15:36:54.753526Z","type":"response","auth":{"client_token":"hmac-sha256:9ea...","accessor":"hmac-sha256:3e1...","display_name":"token","policies":["my-node5-policy","default"],"token_policies":["my-node5-policy","default"],"policy_results":{"allowed":true,"granting_policies":[{"name":"my-node5-policy","namespace_id":"mynamespace","type":"acl"}]},"token_type":"service","token_ttl":86313600,"token_issue_time":"2022-12-19T11:57:17-06:00"},"request":{"id":"92907fb1-1ecb-fa60-f542-2467694b5104","client_id":"vwCHkBsAvX3LNCkPSmy0hHQxoGMqGUucWS/j9GUwP50=","operation":"read","mount_type":"kv","mount_accessor":"kv_e7eeae45","client_token":"hmac-sha256:aa7...","client_token_accessor":"hmac-sha256:3e1...","namespace":{"id":"mynamespace"},"path":"mysecretmountpoint/NDhfSU5OT0RCS2V5LTU2YjgxNjVhLTI4N2ItMTFlZC1hZTEzLTAwMTU1ZDBjZDQwZS0xMF8=","remote_address":"172.31.255.98","remote_port":46992},"response":{"mount_type":"kv","mount_accessor":"kv_e7eeae45","secret":{},"data":{"type":"hmac-sha256:71b...","value":"hmac-sha256:601..."}}}
1 Like

The SST process is just a managed xtrabackup process. Can you take a manual backup using xtrabackup of one of the nodes? That should fail as well. If that instead succeeds, then there’s something not working properly in the overall SST process with the Vault plugin.

1 Like

Ran the xtrabackup process manually, and it is the same error on both active nodes, just like the one you pointed out:

221219 16:14:06 Added plugin 'keyring_vault.so' to load list.
Plugin keyring_vault reported: 'Probing mysecretmountpoint for being a 
mount point unsuccessful - skipped.'

It does not matter which node is the donor. The result is identical. Both donors can successfully start/stop mysql service with the current keyring configuration file. Both show that encrypted tables are visible and accessible.

Separately, I can use their tokens to sign in to vault and view the corresponding secrets.

So Vault does appear to be the issue (or at least that part of the process that does the probing), but I am not sure how to track the issue down.

1 Like

Looks like another client has opened a similar issue. I suggest you comment on that JIRA as well to increase visibility. I would also ask our engineers in that JIRA how to enable debug mode for the plugin so that you can provide some additional information.

The last option is to test this situation out using PXC 8.0 and PXB 8.0 as those are the current GA versions. If you don’t know, 5.7 will EOL Oct '23 so there is not much development efforts going towards something that will be dead in less than a year.

1 Like