Hi people,
I’m trying to get a XtraDB Cluster 5.7 to perform similar to the standalone 5.5 version. I realise there is some performance degradation to be expected due to the replication/sync of the cluster nodes, but the difference between the 2 is so much (and even on the cluster between types of queries) that I can not imagine that this is the difference to be expected… Plus, it’s not a write action, but a READ action…
When I gather the same set of data using UNION the difference with a standalone server is negligible, but when I try to do so with an OR in the WHERE it slows down badly… from 6 seconds on the original standalone server and the (cluster) union version to 141 seconds and upwards… The source query I was trying to debug makes ‘MySQL gone away’ results every time on the cluster, while the original MySQL 5.5 is doing the same query on (relatively) inferior hardware within 9 seconds… (exactly the same dataset).
I have read so much on the blog about performance optimizations that I probably have a near perfect config for the cluster nodes as they are now, which is much better in specs then the original server.
The config:
query_cache_limit = 512M
query_cache_size =0
query_cache_type = 1
query_cache_min_res_unit = 40960
query_prealloc_size= 81920
max_connections = 5000
thread_cache_size = 16384
max_allowed_packet = 1024M
read_buffer_size = 2048M
max_connect_errors = 90
myisam_sort_buffer_size = 512M
bulk_insert_buffer_size = 512M
tmp_table_size = 2048M
max_heap_table_size = 1024M
innodb_file_per_table=1
innodb_buffer_pool_size=17179869184
innodb_flush_log_at_trx_commit=1
innodb_sort_buffer_size=10485760
innodb_flush_method=O_DIRECT
innodb_buffer_pool_instances=32
innodb_log_file_size=10G
innodb_doublewrite=1
innodb_change_buffering=none
innodb_adaptive_hash_index=OFF
innodb_flush_neighbors=0
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_lru_scan_depth=8192
innodb_io_capacity=15000
innodb_io_capacity_max=25000
loose-innodb-page-cleaners=4
table_open_cache_instances=64
table_open_cache=5000
loose-innodb-log_checksum-algorithm=crc32
loose-innodb-checksum-algorithm=strict_crc32
loose-performance_schema=ON
loose-performance-schema-instrument='wait/synch/%=ON'
The 141 second query:
SELECT
`pdu`.`id` AS `dcpdu_id`,
`pdu`.`name` AS `dcpdu_name`,
`cabinet`.`id` AS `dccabinet_id`,
CONCAT(row.name, '', cabinet.name) AS `dccabinet_name`,
`socketlocation_cabinet`.`id` AS `socketlocation_dccabinet_id`,
CONCAT(socketlocation_row.name,
'',
socketlocation_cabinet.name) AS `socketlocation_dccabinet_name`,
`location`.`locationtag` AS `dclocationtag`,
`suite`.`name` AS `dcsuite_name`,
`device`.`id` AS `dcenergy_device_id`,
`meter`.`id` AS `dcenergy_meter_id`,
`measurement`.`id` AS `dcenergy_measurement_id`,
`device`.`device_name`,
`device`.`device_local_id`,
`device`.`device_type_id`,
`device`.`device_type_name`,
`device`.`role`,
`meter`.`measurement_group_id`,
`meter`.`measurement_group_name`,
`meter`.`group_switch`,
`meter`.`measurement_name`,
`meter`.`measurement_unit`,
`meter`.`company_details_id`,
`meter`.`socket_type_id`,
`meter`.`cable_type_id`,
`meter`.`phase_id`,
`meter`.`trigger_load`,
`meter`.`internalnotes`,
`company`.`name` AS `company_name`,
`measurement`.`collected`,
(SUM(measurement.value) / COUNT(DISTINCT measurement.id)) AS `value`,
COUNT(DISTINCT measurement.id) AS `values`,
(SUM(msrmnt.value) / COUNT(DISTINCT msrmnt.id)) AS `actualload`
FROM
`dcenergy_meter` AS `meter`
LEFT JOIN
`dcenergy_measurement` AS `measurement` ON measurement.dcenergy_meter_id = meter.id
LEFT JOIN
`dcenergy_device` AS `device` ON device.id = meter.dcenergy_device_id
LEFT JOIN
`dcpdu_parts` AS `pduparts` ON pduparts.dcenergy_device_id = device.id
LEFT JOIN
`dcpdu` AS `pdu` ON pdu.id = pduparts.dcpdu_id
LEFT JOIN
`dccabinets` AS `cabinet` ON cabinet.id = meter.dccabinet_id
LEFT JOIN
`dcrow` AS `row` ON row.id = cabinet.dcrow_id
LEFT JOIN
`dccabinets` AS `socketlocation_cabinet` ON socketlocation_cabinet.id = meter.socketlocation_dccabinet_id
LEFT JOIN
`dcrow` AS `socketlocation_row` ON socketlocation_row.id = socketlocation_cabinet.dcrow_id
LEFT JOIN
`dcsuite` AS `suite` ON suite.id = cabinet.dcsuite_id
LEFT JOIN
`dclocation` AS `location` ON location.id = pdu.dclocation_id
LEFT JOIN
`company_details` AS `company` ON company.id = meter.company_details_id
LEFT JOIN
`dcenergy_meter` AS `mtr` ON `mtr`.group_switch = `meter`.`group_switch`
AND `mtr`.dcenergy_device_id = `device`.`id`
AND `mtr`.measurement_name = 'real_power_total'
LEFT JOIN
`dcenergy_measurement` AS `msrmnt` ON `msrmnt`.dcenergy_meter_id = `mtr`.id
AND `msrmnt`.collected = `measurement`.`collected`
WHERE
(device.role = 's')
AND (meter.measurement_name = 'real_energy_kwh')
AND (measurement.collected = '2016-01-28 00:00:00'
OR measurement.collected = '2016-02-28 00:00:00'
OR measurement.collected = '2016-03-28 00:00:00'
OR measurement.collected = '2016-04-28 00:00:00'
OR measurement.collected = '2016-05-28 00:00:00'
OR measurement.collected = '2016-06-28 00:00:00'
OR measurement.collected = '2016-07-28 00:00:00'
OR measurement.collected = '2016-08-28 00:00:00'
OR measurement.collected = '2016-09-28 00:00:00'
OR measurement.collected = '2016-10-28 00:00:00'
OR measurement.collected = '2016-11-28 00:00:00'
OR measurement.collected = '2016-12-28 00:00:00')
AND (meter.active = '1')
AND (meter.group_switch > '0')
GROUP BY `measurement`.`dcenergy_meter_id` , DATE_FORMAT(`measurement`.`collected`, '%m')
ORDER BY `pdu`.`id` ASC , `meter`.`group_switch` ASC , `measurement`.`collected` ASC;
The original query which ran mostly with less then 9 seconds on the standalone 5.5 server had this measurement.collected instead of the above list, the rest is identical to the one above:
(measurement.collected >= '2016-01-28 00:00:00'
AND measurement.collected <= '2016-01-28 00:30:00')
OR (measurement.collected >= '2016-02-28 00:00:00'
AND measurement.collected <= '2016-02-28 00:30:00')
OR (measurement.collected >= '2016-03-28 00:00:00'
AND measurement.collected <= '2016-03-28 00:30:00')
OR (measurement.collected >= '2016-04-28 00:00:00'
AND measurement.collected <= '2016-04-28 00:30:00')
OR (measurement.collected >= '2016-05-28 00:00:00'
AND measurement.collected <= '2016-05-28 00:30:00')
OR (measurement.collected >= '2016-06-28 00:00:00'
AND measurement.collected <= '2016-06-28 00:30:00')
OR (measurement.collected >= '2016-07-28 00:00:00'
AND measurement.collected <= '2016-07-28 00:30:00')
OR (measurement.collected >= '2016-08-28 00:00:00'
AND measurement.collected <= '2016-08-28 00:30:00')
OR (measurement.collected >= '2016-09-28 00:00:00'
AND measurement.collected <= '2016-09-28 00:30:00')
OR (measurement.collected >= '2016-10-28 00:00:00'
AND measurement.collected <= '2016-10-28 00:30:00')
OR (measurement.collected >= '2016-11-28 00:00:00'
AND measurement.collected <= '2016-11-28 00:30:00')
OR (measurement.collected >= '2016-12-28 00:00:00'
AND measurement.collected <= '2016-12-28 00:30:00')
The 1 second UNION version changes:
(SELECT
(SNIP)
AND (measurement.collected = '2016-01-28 00:00:00')
(SNIP)
GROUP BY `measurement`.`dcenergy_meter_id` , DATE_FORMAT(`measurement`.`collected`, '%m')
UNION
(Repeats for all the months in the year on the same day).
ORDER BY `dcpdu_id` ASC , `group_switch` ASC , `collected` ASC;
The cluster server storage benchmark (simple version), which holds the database files:
$ dd if=/dev/zero of=/storage/test bs=1k count=102400000
102400000+0 records in
102400000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 297,296 s, 353 MB/s
$ dd of=/dev/null if=/storage/test bs=1k count=102400000
102400000+0 records in
102400000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 66,4824 s, 1,6 GB/s
Ask me for anything else if needed. I want this issue to be resolved, it keeps a 12.000 euro investment unused at the moment, which I want to rectify as soon as possible…