Given a 3-node cluster (percona-xtradb-cluster-server 5.5.24-23.6-341), dedicated systems, SSD storage - with the following non-default global settings:
- wsrep_retry_autocommit=4
- wsrep_causal_reads=1
Other settings of interest:
- commit_order is set to 3 and causal_read_timeout is PT30S
- user_send_window is 2, send_window is 4 (systems are in same area, gigabit ethernet)
Some tables are, at certain moments, receiving a number of updates, while another component may issue a joined select involving one or two of the tables receiving updates.
In certaing situation, the above described select will fail with a ‘Causal wait failed’.
Trying to avoid as much as possible changing the code (for now at least), what possible workarounds would eliminate the above situation?
Not really sure if increasing the causal_read_timeout would actually help with the above, or there’s any other recommendation in this scenario that would help.
Any hints/tips are more than welcome.