Hi,
Please bear w/ me as I am new to percona and just trying to diagnose an issue. This may be obvious/explained somewhere else, but I’d like to ask the experts on here first.
We are running a 6 node cluster with:
Percona-XtraDB-Cluster-server-5.5.30-23.7.4.406
Percona-XtraDB-Cluster-galera-2.5-1.150
-
We have some JDBC code that runs across many app servers.
-
Each of these app servers may be connecting to any of the 6 pecona nodes at any time.
-
The block of code in question does the following:
a) acquires an clusterd/distributed in-memory lock (via hazelcast) that is enforced across all the app servers that enter this code block
b) selects a row to determine the latest column value (columnA) from a single row (rowA)
c) updates rowA setting columnA = to the new value
d) releases the distributed lock, permitting other app servers to acquire A, and then do B and C respectively… and on and on
(1) When the JDBC code described above uses a driver connection with “autocommit = true” turned on, we see that occasionally various servers that enter the described block, are returned the old value from columnA from rowA in the routine above. Implying that whatever node they are talking to has yet to receive the latest value from some other servers previous write.
IMPORTANT NOTE for (1): when we force ALL app servers to talk to ONE percona node explicitly (the same node all the time) this “old value read” behavior goes away
(2) When the JDBC code described above uses a driver connection with “autocommit = false” and the code is altered to explicitly issue transaction start and commit demarcations on its own, then the “reads of old values” problem described in (1) goes away.
It is my understanding that for a given client connection to a percona node, when issuing an sql write of some sort; after committing locally on the node in question, percona/galera will not release the calling thread with an OK, until that write is synchronously propagated successfully to all percona nodes in the cluster… and this is what appears to be occurring as I would expect in (2) above.
I guess my question is, how is this synchronous write consistency across the cluster different in regards to when an OK is returned to the calling client when autocommit=true VS when the client explicitly issues the commit? From what i can tell, if autocommit = true, the client thread gets an OK and releases our logical “lock” prematurely thereby permitting other app servers to aquire the lock and read from the table, getting the “yet to be updated value” because it has yet to be propagated in the cluster.