Undocumented behavior with auto_increment columns in a cluster

Given:

A three node cluster. wsrep_25.10.r4144
wsrep_auto_increment_control = ON (default) on all nodes
auto_increment_increment [3] and auto_increment_offset [1, 2, or 3 depending on the node] appear to be managed properly on all three nodes.
User schema has about 110 tables with auto_increment primary key columns.
Clients connect directly to a node according to their function. There is no proxy or load balancer spreading work across nodes.
A set of tables receive inserts on node 1, and only on node 1.
A separate set of tables receive inserts on node3, and only on node 3.
No inserts are performed on node2.

Given the above we would expect that values in auto_increment columns would increase by 3 in all tables. We only see that in about 50 of the 110 tables.

Undocumented behavior:

In the remaining 60 of the 110 tables, the auto_increment columns are incrementing by 1. It is reproducible that consecutive inserts direct to one node will cause this.

This inconsistency is concerning and I’d like to determine the cause of it so I can prevent it from causing problems in the future if our usage changes.

Any advice on where to look next would be appreciated.

Can’t seem to reproduce this case, ran sysbench on 100 tables of 10000 records each but haven’t seen any of the tables with increments of 1. Please share my.cnf from affected node and my.cnf from at least other node not affected by this case.

Thanks for the response!

This is from the node that accepts almost all of the inserts in the cluster.

[mysqld]

basedir=/usr/
binlog_format=ROW
datadir=/opt/data
default_storage_engine=innodb
explicit_defaults_for_timestamp=1
innodb_autoinc_lock_mode=2
innodb_buffer_pool_instances=8
innodb_buffer_pool_size=8264M
innodb_data_file_path = ibdata1:100M:autoextend
innodb_doublewrite=1
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
innodb_flush_method = O_DIRECT
innodb_io_capacity=800
innodb_log_buffer_size=96M
innodb_log_files_in_group=2
innodb_log_file_size=1024M
innodb_print_all_deadlocks = 1
innodb_read_io_threads=8
innodb_stats_on_metadata=0
innodb_thread_concurrency=64
innodb_write_io_threads=8
join_buffer_size = 15M
key_buffer_size = 128M
log_error=/var/log/mysql.log
log-queries-not-using-indexes
log-slow-verbosity=query_plan,explain
log_warnings=1
long_query_time = 1
lower_case_table_names=0
max_allowed_packet = 512M
max_connections=275
max_heap_table_size = 64M
memlock=0
performance_schema=ON
pid_file=mysqld.pid
port=3306
query_cache_limit = 4M
query_cache_size = 256M
query_cache_type = 1
read_buffer_size = 5M
read_rnd_buffer_size = 256M
slow-query-log = 1
slow-query-log-file = /var/log/mysql/mysql-slow.log
socket=/var/run/mysqld/mysqld.sock
sort_buffer_size = 5M
sysdate_is_now=1
table_open_cache=700
thread_cache_size=512
tmp_table_size = 64M
user=mysql

wsrep_auto_increment_control=1
wsrep_causal_reads=0
wsrep_certify_nonPK=1
wsrep_cluster_address=gcomm://10.0.2.142,10.0.2.141,10.0.2.161
wsrep_cluster_name=“my_wsrep_cluster”
wsrep_convert_LOCK_to_trx=0
wsrep_debug=0
wsrep_drupal_282555_workaround=0
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_node_address=10.0.2.161
wsrep_node_name=10.0.2.161
wsrep_provider_options=“gcache.size=98304M; gmcast.segment=0”
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_replicate_myisam=1
wsrep_retry_autocommit=1
wsrep_slave_threads=4
wsrep_sst_auth=root:obfuscatepassword
wsrep_sst_method=xtrabackup-v2

[mysql]

socket=/var/run/mysqld/mysqld.sock

[client]

socket=/var/run/mysqld/mysqld.sock

[mysqldump]

max_allowed_packet = 512M
socket=/var/run/mysqld/mysqld.sock

[mysqld_safe]

basedir=/usr/
datadir=/opt/data
log_error=/var/log/mysql.log
pid_file=mysqld.pid

Only one of the other two nodes receives inserts at all, and only to one table. It is not exhibiting the problem. The my.cnf there is the same, except for wsrep_node_name. All three nodes used the same template for the my.cnf to start, with only the minimal changes required made locally on each node. I’m not sure we’d be so lucky as to find a smoking gun by finding differences in config.

This database began life on RDS and got migrated to this cluster. I came into the project after that and am trying to make the database more supportable. There were a lot of things present that could certainly provoke bugs. Mixture of MyISAM and InnoDB. Mix of UTF8 and latin1, sometimes in the same table…

Running values from the three nodes match what is expected:

node1> show variables like ‘%auto_increment%’;
±-----------------------------±------+
| Variable_name | Value |
±-----------------------------±------+
| auto_increment_increment | 3 |
| auto_increment_offset | 2 |
| wsrep_auto_increment_control | ON |
±-----------------------------±------+

node2> show variables like ‘%auto_increment%’;
±-----------------------------±------+
| Variable_name | Value |
±-----------------------------±------+
| auto_increment_increment | 3 |
| auto_increment_offset | 3 |
| wsrep_auto_increment_control | ON |
±-----------------------------±------+

node3> show variables like ‘%auto_increment%’;
±-----------------------------±------+
| Variable_name | Value |
±-----------------------------±------+
| auto_increment_increment | 3 |
| auto_increment_offset | 1 |
| wsrep_auto_increment_control | ON |
±-----------------------------±------+

The auto_increment_increment setting values seem correctly set on the nodes.
Can you post an example (show create table…) of table that is always populated correctly (with increment of 3) and also table that is populated with increment of 1?
Are you absolutely sure that those “wrong” tables are only written from single node? I assume none of the cluster nodes is replicating any data from an async master?

Figured out what is going on. As noted above, this database began life as an RDS instance. Many of the tables were populated there and have not had any new data inserted since. That explains why the existing data in their auto_increment columns increments by 1.

Increment by 3 is working on all tables that are currently receiving inserts. I got confused because we have two schemas with tables by the same name. As part of our release cycle, we were updating the schemas one by one, and I was looking at the wrong one. The one with recent inserts was behaving correctly. The other one hadn’t gotten an insert since the RDS days. Once it got the insert, it too shows the expected behavior.

Thanks for your help guys. One less mystery.