Unable to bootstrap 1st node, changed datadir and user

Hi Team,
I am trying to bootstrap new pxc 8.0.28 cluster for which I have also changed datadir, logdir and also changed os level service account (mysql) after all these changes mysqld service was up and running.

But after stopping mysqld service and trying to bootstrap first node got lot of permission related errors for files under datadir which were temporarily fixed by applying chmod 777 on those files

I have also changed the security context of original files/dir to the copied location as it was recommended in one of the post.

But 777 and adding correct security context is not fixing the below error which is “failed to open galera.cache file.”

[ERROR] WSREP: Failed to open file '/data//galera.cache': 1 (Operation not permitted)

Thanks
Aditya

1 Like

Was able to solve above issue by useting user and group in /usr/lib/systemd/system/mysql@.service
But now
After successfully bootstrapping first node and adding the first node PEM to the second node when I try to start the second node I get below error

2022-10-03T14:15:18.237418Z 0 [Note] [MY-000000] [Galera] gcomm thread scheduling priority set to other:0 
2022-10-03T14:15:18.237529Z 0 [Warning] [MY-000000] [Galera] Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2022-10-03T14:15:18.237551Z 0 [Note] [MY-000000] [Galera] Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2022-10-03T14:15:18.237796Z 0 [Note] [MY-000000] [Galera] GMCast version 0
2022-10-03T14:15:18.238007Z 0 [Note] [MY-000000] [Galera] (cee5df9f-a27f, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2022-10-03T14:15:18.238029Z 0 [Note] [MY-000000] [Galera] (cee5df9f-a27f, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2022-10-03T14:15:18.238253Z 0 [Note] [MY-000000] [Galera] EVS version 1
2022-10-03T14:15:18.238321Z 0 [Note] [MY-000000] [Galera] gcomm: connecting to group 'pxc-cluster', peer ‘9.xx.xx.xx,9.xx.xx.xx,9.xx.xx.xx:'
2022-10-03T14:15:18.242069Z 0 [Note] [MY-000000] [Galera] (cee5df9f-a27f, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://9.xx.xx.xx:4567
2022-10-03T14:15:21.240756Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)
2022-10-03T14:15:21.240976Z 0 [Note] [MY-000000] [Galera] EVS version upgrade 0 -> 1
2022-10-03T14:15:21.241036Z 0 [Note] [MY-000000] [Galera] PC protocol upgrade 0 -> 1
2022-10-03T14:15:21.241100Z 0 [Warning] [MY-000000] [Galera] no nodes coming from prim view, prim not possible
2022-10-03T14:15:21.241169Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,cee5df9f-a27f,1)
memb {
	cee5df9f-a27f,0
	}
joined {
	}
left {
	}
partitioned {
	}
)
2022-10-03T14:15:21.741731Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT3.50345S), skipping check
2022-10-03T14:15:51.256638Z 0 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2022-10-03T14:15:51.256741Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2022-10-03T14:15:51.257632Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
	 at gcomm/src/pc.cpp:connect():161
2022-10-03T14:15:51.257700Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2022-10-03T14:15:52.258053Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2022-10-03T14:15:52.258155Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2022-10-03T14:15:52.258431Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1811: Failed to open channel 'pxc-cluster' at 'gcomm://9.xx.xx.xx,9.xx.xx.xx,9.xx.xx.xx': -110 (Connection timed out)
2022-10-03T14:15:52.258493Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out
2022-10-03T14:15:52.258557Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://.xx.xx.xx,9.xx.xx.xx,9.xx.xx.xx) failed to establish connection with cluster (reason: 7)
2022-10-03T14:15:52.258616Z 0 [ERROR] [MY-010119] [Server] Aborting
2022-10-03T14:15:52.259236Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.28-19.1)  Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3.
2022-10-03T14:15:52.261029Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED
2022-10-03T14:15:52.261129Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2022-10-03T14:15:52.266072Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2022-10-03T14:15:52.270845Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2022-10-03T14:15:52.274522Z 0 [Note] [MY-000000] [Galera] apply mon: entered 0
2022-10-03T14:15:52.274560Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0
2022-10-03T14:15:52.274575Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0
2022-10-03T14:15:52.274589Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0
2022-10-03T14:15:52.274610Z 0 [Note] [MY-000000] [Galera] avg deps dist 0
2022-10-03T14:15:52.274625Z 0 [Note] [MY-000000] [Galera] avg cert interval 0
2022-10-03T14:15:52.274639Z 0 [Note] [MY-000000] [Galera] cert index size 0
2022-10-03T14:15:52.274752Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2022-10-03T14:15:52.274860Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0
2022-10-03T14:15:52.274887Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2022-10-03T14:15:52.275180Z 0 [Note] [MY-000000] [Galera] Shifting CLOSED -> DESTROYED (TO: 0)
2022-10-03T14:15:52.277260Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk..
1 Like

All ports are accessible between 2 nodes, I have also installed xtrabackup, please find my.cnf of both nodes
1st node

[client]
socket=/var/lib/mysql/mysql.sock
ssl-ca=/var/lib/mysql/ca.pem
ssl-cert=/var/lib/mysql/client-cert.pem
ssl-key=/var/lib/mysql/client-key.pem

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
user=please
# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib64/galera4/libgalera_smm.so


wsrep_cluster_address=gcomm://

default_storage_engine=InnoDB

#wsrep_provider_options="gcache.size=300M;gcache.page_size=300M;gcache.dir=/var/lib/mysql;socket.ssl_key=server-key.pem;socket.ssl_cert=server- cert.pem;socket.ssl_ca=ca.pem"
wsrep_provider_options="socket.ssl_key=server-key.pem;socket.ssl_cert=server- cert.pem;socket.ssl_ca=ca.pem"

ssl-ca=/var/lib/mysql/ca.pem
ssl-cert=/var/lib/mysql/server-cert.pem
ssl-key=/var/lib/mysql/server-key.pem

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

##wsrep_data_home_dir=/var/lib/mysql

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address=9.xx.xx.xx
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name=rhelxxx

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING

# SST method
wsrep_sst_method=xtrabackup-v2

[sst]
encrypt=4
ssl-key=/var/lib/mysql/server-key.pem
ssl-ca=/var/lib/mysql/ca.pem 
ssl-cert=/var/lib/mysql/server-cert.pem
1 Like

2nd node my.cnf

2nd node 

[client]
socket=/var/lib/mysql/mysql.sock
ssl-ca=/node1_cert/ca.pem
ssl-cert=/node1_cert/client-cert.pem
ssl-key=/node1_cert/client-key.pem


[mysqld]
server-id=2
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib64/galera4/libgalera_smm.so

wsrep_cluster_address=gcomm://9.xx.xx.xx,9.xx.xx.xx,9.xx.xx.xx

default_storage_engine=InnoDB


wsrep_provider_options="socket.ssl_key=server-key.pem;socket.ssl_cert=server- cert.pem;socket.ssl_ca=ca.pem"

ssl-ca=/node1_cert/ca.pem
ssl-cert=/node1_cert/server-cert.pem
ssl-key=/node1_cert/server-key.pem

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address=9.xx.xx.xx
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name=rhelxxx

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=ENFORCING

# SST method
wsrep_sst_method=xtrabackup-v2

[sst]
encrypt=4
ssl-key=/node1_cert/server-key.pem
ssl-ca=/node1_cert/ca.pem
ssl-cert=/node1_cert/server-cert.pem

1 Like

Did you copy the .pem files from node1 over to node2 before starting node2? This is required due to the new default SSL parameters.

1 Like

@matthewb Yes Sir, All the PEM you see in node 2 cnf file are copied from node1.

1 Like

@matthewb Also wanted to mention when I do netstat -nltp on 1st node I only see 3 ports active is it normal?
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 21886/mysqld
tcp6 0 0 :::33060 :::* LISTEN 21886/mysqld
tcp6 0 0 :::3306 :::* LISTEN 21886/mysqld

1 Like

There’s a mistake in wsrep_provider_options above. Check that was just a copy error. Also, you need to explicitly state socket.ssl=ON so add that to wsrep_provider_options as well.

1 Like

Thanks Sir, It’s solved.

1 Like