Not the answer you need?
Register and ask your own question!

Galera inconsistent data?

bradh352bradh352 ContributorCurrent User Role Beginner
I've been doing some testing of my application against Percona-XtraDB-Cluster-server-56-5.6.28-25.14.1.el7.x86_64,
and have run across something I cannot explain, and it appears to only occur under load.

I've got a load balancer sitting in front of 3 nodes directing traffic in a round robin fashion (Linux IPVS/LVS using server load
balancing features in keepalived). My application spawns 25 DB connections, resulting in 8-9 connections per DB node.

Essentially my application is doing something like:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT var FROM foo WHERE id = ? FOR UPDATE;
# Do some math on 'var'
UPDATE foo SET var=? WHERE id=?;
COMMIT;

"id" is the primary key in this case, so basically doing the same thing as this blog post:
http://galeracluster.com/2015/09/sup...alera-cluster/
I understand "SERIALIZABLE" isn't really supported, but I'd hope Galera wouldn't downgrade it
to worse than REPEATABLE READ or something.

What I am observing is "N" concurrent committers get the same select result (expected with galera), do their math,
update with math result which succeeds (expected with galera), then commit and all succeed (*unexpected*).
I would have expected all but one committer to deadlock (1213) and I'd need to retry the transaction on the
deadlocked nodes. I do get some occasional deadlocks reported, but not as many as I should. This leads to
inconsistent results since the math performed is now off ... needless to say this causes major issues.

If I attempt to reproduce this with the mysql command line tool, I cannot, all but one case results in a deadlock as
expected.

A few other notes:
1) The math operation performed may or may not be the same for each transaction, it depends on the operation,
but some may indeed appear to return the *same* result (Does this matter?)
2) The number of committers ("N") here could be as high as the number of connections, in theory, but in
practice it is probably 1-3 ... but this statement during a load test is executed a few dozen times per
*second*.
3) This works flawlessly if all traffic is directed to a single node in the cluster, so it is definitely something introduced
by using Galera.
4) The value of wsrep_sync_wait appears to have zero effect (values 0, 1, 3 tried)
5) I could attempt to change the "UPDATE" statement to include the original value of 'foo' that is expected
to try to catch this condition, but I'm not actually sure if it would depending on how the record updates
are transmitted. I have not tried that, but wouldn't think it should be necessary.
6) I'm 90% sure we tested this same load scenario a year or so ago and this worked fine, could there be
a regression in newer versions? I haven't yet tried to roll back to something older.
7) I haven't yet tried to create a reduced test case, I wanted to first ask to see if anyone else had seen
what I'm seeing.


And finally, I'm sure everyone wants my DB configs ...

/etc/my.cnf:
[mysqld]
datadir = /var/lib/mysql
# move tmpdir due to /tmp being a memory backed tmpfs filesystem, mysql uses this for on disk sorting
tmpdir = /var/lib/mysql/tmp

[mysqld_safe]
pid-file = /run/mysqld/mysql.pid
syslog
!includedir /etc/my.cnf.d

/etc/my.cnf.d/base.cnf:
[mysqld]
bind-address = 0.0.0.0
key_buffer = 256M
max_allowed_packet = 16M
max_connections = 256
# Some optimizations
thread_concurrency = 10
sort_buffer_size = 2M
query_cache_limit = 100M
query_cache_size = 256M
log_bin
binlog_format = ROW
gtid_mode = ON
log_slave_updates
enforce_gtid_consistency = 1
group_concat_max_len = 102400
innodb_buffer_pool_size = 10G
innodb_log_file_size = 64M
innodb_file_per_table = 1
innodb_file_format = barracuda
default_storage_engine = innodb
# SSD Tuning
innodb_flush_neighbors = 0
innodb_io_capacity = 6000

/etc/my.cnf.d/cluster.cnf:
# Galera cluster
[mysqld]
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = "sstuser:s3cretPass"
wsrep_cluster_name = cluster
wsrep_slave_threads = 32
wsrep_max_ws_size = 2G
wsrep_provider_options = "gcache.size = 5G; pc.recovery = true"
wsrep_cluster_address = gcomm://10.30.30.11,10.30.30.12,10.30.30.13
wsrep_sync_wait = 0
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = 1
innodb_flush_log_at_trx_commit = 0
sync_binlog = 0
innodb_support_xa = 0
innodb_flush_method = ALL_O_DIRECT

[sst]
progress = 1
time = 1
streamfmt = xbstream

Thanks!
-Brad

Comments

  • bradh352bradh352 Contributor Current User Role Beginner
    I've written a test case for this, though I'm having problems attaching it to here, its not clear what the rules are for attachments, its just a .c file.

    Anyhow, one thing I neglected to mention previously is I'm actually getting a deadlock on a couple of connections as well that never return results. If I restart the mysql daemon itself, it unlocks the client side so it can error out.

    We're in the process of trying to roll back to older versions with this test case we created to see if this is a recently introduced regression or not.
  • bradh352bradh352 Contributor Current User Role Beginner
    I just filed a bug here with my test case: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1560206
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.