Memory problems with XtraDB

Hi,
we have a 3 node cluster (with only two nodes with MySQL running and one with only garbd daemon running for quorum management) on RHEL 6.2 with 8GB RAB and 4GB swap.
On this cluster, we have a 16 databases fot 160GB of data (more or less) and we use only one server as master to execute our application’s query.
Against the DB’s, are executed a lot of query’s kinds, but on delete query we suffer for server’s swapping till the saturation of swap partition and the consequent mysql crash.
Every transaction delete 20.000 rows.

We have reduced the innodb-buffer-pool-size but the problem still occur.
Here’s part of our my.cnf related to InnoDB and memory:


innodb-buffer-pool-size=3584M
innodb-additional-mem-pool-size=32M
innodb-log-buffer-size=16M
innodb-flush-log-at-trx_commit=1
innodb-file-per-table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb-doublewrite=1
innodb_log_file_size=512M
innodb-log-files-in-group=2
innodb-buffer-pool-instances=2
innodb-thread-concurrency=0
#innodb-file-format=barracuda
innodb-flush-method = O_DIRECT
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
engine-condition-pushdown=1

key_buffer_size = 24M
tmp_table_size = 32M
max_heap_table_size = 32M
max-allowed-packet = 16M
#sort-buffer-size = 512K
#read-buffer-size = 256K
#read-rnd-buffer-size = 512K
myisam-sort-buffer_size = 8M
skip-name-resolve
memlock=0
sysdate-is-now=1
max-connections=500
thread-cache-size=50
query-cache-type = 0
query-cache-size = 0
table-open_cache=4096
lower-case-table-names=0
open_files_limit=65535

Are the transactions too many large?
What are the max transactions size suggested?

Thanks in advance for any suggestion!

Hi,
Before each transaction can be replicated and applied in the cluster nodes, it has to be fully stored in memory. I suppose those 20.000 rows means actually a lot of data?
And yes, the bigger transactions the worse for XtraDB Cluster, so the best solution is to split this delete into smaller chunks. Or add more RAM. Lowering InnoDB buffer pool too much will only negatively affect overall performance.

Hi,
we hadsplit delete in chunks of 5.000 rows and it seems that now the amount of data don’t causes memory problems.
But recently another scripts that make only insert of 1500 records (for an amount of 480kb more or less) that before it had no memory problems, during a logest run that normal, make server swapping and causes server out of cluster for one moment.

We are tring to analyze the wsrep and galera variables to check if we’re hit a limitation, but we can’t find any element helping us.

The actual state of cluster is:


*************************** 1. row ***************************
Variable_name: wsrep_OSU_method
Value: TOI
*************************** 2. row ***************************
Variable_name: wsrep_auto_increment_control
Value: OFF
*************************** 3. row ***************************
Variable_name: wsrep_causal_reads
Value: OFF
*************************** 4. row ***************************
Variable_name: wsrep_certify_nonPK
Value: ON
*************************** 5. row ***************************
Variable_name: wsrep_cluster_address
Value: gcomm://xxx.xxx.xxx,xxx.xxx.xxx,xxx.xxx.xxx
*************************** 6. row ***************************
Variable_name: wsrep_cluster_name
Value: cluster01
*************************** 7. row ***************************
Variable_name: wsrep_convert_LOCK_to_trx
Value: OFF
*************************** 8. row ***************************
Variable_name: wsrep_data_home_dir
Value: /var/lib/mysql/
*************************** 9. row ***************************
Variable_name: wsrep_dbug_option
Value:
*************************** 10. row ***************************
Variable_name: wsrep_debug
Value: OFF
*************************** 11. row ***************************
Variable_name: wsrep_desync
Value: OFF
*************************** 12. row ***************************
Variable_name: wsrep_drupal_282555_workaround
Value: OFF
*************************** 13. row ***************************
Variable_name: wsrep_forced_binlog_format
Value: NONE
*************************** 14. row ***************************
Variable_name: wsrep_load_data_splitting
Value: ON
*************************** 15. row ***************************
Variable_name: wsrep_log_conflicts
Value: OFF
*************************** 16. row ***************************
Variable_name: wsrep_max_ws_rows
Value: 131072
*************************** 17. row ***************************
Variable_name: wsrep_max_ws_size
Value: 1073741824
*************************** 18. row ***************************
Variable_name: wsrep_mysql_replication_bundle
Value: 0
*************************** 19. row ***************************
Variable_name: wsrep_node_address
Value: xxx.xxx.xxx.xxx
*************************** 20. row ***************************
Variable_name: wsrep_node_incoming_address
Value: AUTO
*************************** 21. row ***************************
Variable_name: wsrep_node_name
Value: server01.domain.com
*************************** 22. row ***************************
Variable_name: wsrep_notify_cmd
Value:
*************************** 23. row ***************************
Variable_name: wsrep_on
Value: ON
*************************** 24. row ***************************
Variable_name: wsrep_provider
Value: /usr/lib64/libgalera_smm.so
*************************** 25. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = xxx.xxx.xxx.xxx; base_port = 4567; cert.log_conflicts = no; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 2048M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 10.10.10.91; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
*************************** 26. row ***************************
Variable_name: wsrep_recover
Value: OFF
*************************** 27. row ***************************
Variable_name: wsrep_reject_queries
Value: NONE
*************************** 28. row ***************************
Variable_name: wsrep_replicate_myisam
Value: ON
*************************** 29. row ***************************
Variable_name: wsrep_retry_autocommit
Value: 1
*************************** 30. row ***************************
Variable_name: wsrep_slave_threads
Value: 4
*************************** 31. row ***************************
Variable_name: wsrep_sst_auth
Value: ********
*************************** 32. row ***************************
Variable_name: wsrep_sst_donor
Value:
*************************** 33. row ***************************
Variable_name: wsrep_sst_donor_rejects_queries
Value: OFF
*************************** 34. row ***************************
Variable_name: wsrep_sst_method
Value: xtrabackup
*************************** 35. row ***************************
Variable_name: wsrep_sst_receive_address
Value: AUTO
*************************** 36. row ***************************
Variable_name: wsrep_start_position
Value: 6dee3772-695c-11e3-9964-df91991477dd:9227456

We can’t undestand because the galera’s variable: gcs.recv_q_hard_limit is so big in relation to a different examples fuonded over Internet and why with wsrep_max_ws_size and wsrep_max_ws_rows with actual size we hit memory problems with XtraDB.

Can we tune some variable?
We’re missing to check some configurations?

Thanks,
Bye

EDIT:

Last Monday we have added 2GB RAM to the server,now it has a total of 10GB.

Our software running every day, but the “problematic” query runs only sunday, so in the last 2 days over XtraDB have worked query who have never had problems but we see in the server’s top an high memory use:



top - 10:31:33 up 1 day, 22:36, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 10130576k total, 9988960k used, 141616k free, 155896k buffers
Swap: 4194296k total, 0k used, 4194296k free, 4912120k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2880 mysql 20 0 7296m 6.3g 2.0g S 0.0 65.4 48:22.56 mysqld

Looking on XtraDB cluster 5.6 release notes, we have seen that were fixed some bugs related to memory use and leack.

Can we be affected by one of this issue? We run XtraDB Cluster: Server version: 5.5.34-55-log Percona XtraDB Cluster (GPL), wsrep_25.9.r3928

Thanks
Bye