Difference between data space utilisation between the nodes of the PXC cluster

we are using PXC with compression tables.

The cluster consists of three nodes, and writes always take place on node one.

We discovered a difference in data space utilisation between node 1 and the other two nodes in the cluster.
Nodes 2 and 3 use 6% more disk space than node 1.
is there a way we can reduce this difference between the nodes?
is this expected behavior?

Compare each table’s data, index, and free space, across all nodes:

SELECT table_schema, table_name, CONCAT(ROUND((data_free / 1024 / 1024),2),'MB') data_free
  FROM information_schema.tables
  WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    AND ENGINE LIKE 'InnoDB'

I have checked on all the nodes, date_free for all tables is same

±-------------±-------------------------------------------±----------+
| TABLE_SCHEMA | TABLE_NAME | data_free |
±-------------±-------------------------------------------±----------+
| sysbench | payers | 5.50MB |
| sysbench | events | 5.00MB |
| sysbench | instances | 5.00MB |
| sysbench | psps | 4.50MB |
| sysbench | transactions | 4.50MB |
| sysbench | mapping | 4.50MB |
| sysbench | accounts | 4.00MB |
| sysbench | references_v2 | 4.00MB |
| sysbench | metrics | 3.50MB |

If all the data is the same, I would next use something like du -c /path/datadir to compare individual directories and files to find where the size differences are. Check man du and read up on --apparent-size as well.

Hi Mathew,

below are sizes of ibd files from writer node which is node1 and non-writer node which is node2

Node1
5.4G events.ibd
9.3G psps.ibd
24G counts.ibd

Node2
6.1G events.ibd
13G psps.ibd
29G counts.ibd

There is difference size of tablespace between node1 and node2

Run this new query for those specific tables:

SELECT table_schema, table_name, 
CONCAT(ROUND((data_letngth / 1024 / 1024),2),'MB') dataMB, 
CONCAT(ROUND((index_length / 1024 / 1024),2),'MB') indexMB, 
CONCAT(ROUND((data_free / 1024 / 1024),2),'MB') data_free
  FROM information_schema.tables
  WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    AND table_name IN ('events', 'psps', 'counts');

Then look at things like inode counts of the files. If you want to really know, use this tool to count how many of each type of page are in each .ibd file.

Hi Matthew,

Thank you for your update. We will run the script and inform you if we find anything.

Hi Matthew,

I have ran the script for one of the table where we have observed difference in disk space utilisation

on node1,

±-------------±--------------------------±--------±--------±----------+
| TABLE_SCHEMA | TABLE_NAME | dataGB | indexGB | data_free |
±-------------±--------------------------±--------±--------±----------+
| sysbench | contexts | 17.81GB | 3.54GB | 3.50MB |
±-------------±--------------------------±--------±--------±----------+
1 rows in set (0.00 sec)

INODE fseg_id=1, pages=1, frag=1, full=0, not_full=0, free=0
INODE fseg_id=2, pages=0, frag=0, full=0, not_full=0, free=0
INODE fseg_id=3, pages=7968, frag=32, full=120, not_full=4, free=0
INODE fseg_id=4, pages=4624224, frag=32, full=63903, not_full=8350, free=0
INODE fseg_id=5, pages=2272, frag=32, full=29, not_full=6, free=0
INODE fseg_id=6, pages=314848, frag=32, full=4293, not_full=626, free=0
INODE fseg_id=7, pages=224, frag=32, full=2, not_full=1, free=0
INODE fseg_id=8, pages=113056, frag=32, full=1536, not_full=230, free=0
INODE fseg_id=9, pages=2400, frag=32, full=32, not_full=5, free=0
INODE fseg_id=10, pages=377696, frag=32, full=5150, not_full=751, free=0
INODE fseg_id=11, pages=2400, frag=32, full=30, not_full=7, free=0
INODE fseg_id=12, pages=377376, frag=32, full=5147, not_full=749, free=0

on node2,

±-------------±--------------------------±--------±--------±----------+
| TABLE_SCHEMA | TABLE_NAME | dataGB | indexGB | data_free |
±-------------±--------------------------±--------±--------±----------+
| sysbench | contexts | 25.10GB | 4.03GB | 5.50MB |
±-------------±--------------------------±--------±--------±----------+
1 rows in set (0.00 sec)

INODE fseg_id=1, pages=1, frag=1, full=0, not_full=0, free=0
INODE fseg_id=2, pages=0, frag=0, full=0, not_full=0, free=0
INODE fseg_id=3, pages=13920, frag=32, full=188, not_full=29, free=0
INODE fseg_id=4, pages=6830240, frag=32, full=91753, not_full=14969, free=0
INODE fseg_id=5, pages=2272, frag=32, full=29, not_full=6, free=0
INODE fseg_id=6, pages=314848, frag=32, full=4293, not_full=626, free=0
INODE fseg_id=7, pages=352, frag=32, full=4, not_full=1, free=0
INODE fseg_id=8, pages=157024, frag=32, full=2122, not_full=331, free=0
INODE fseg_id=9, pages=2399, frag=31, full=32, not_full=5, free=0
INODE fseg_id=10, pages=377760, frag=32, full=5151, not_full=751, free=0
INODE fseg_id=11, pages=2399, frag=31, full=31, not_full=6, free=0
INODE fseg_id=12, pages=377760, frag=32, full=5151, not_full=751, free=0

the difference that were observed was no of pages in inode and not_full is higher on node2 compared to node1.
since not_full, has one free page for each page, is the reason for differnce in disk space utilisation between node1 and node2,Not sure if my understanding is correct

This is normal, nodes will have slightly different file size, it depends on many factors, the most common are mvcc, purging and change buffering. This is not an issue at all. The only way to get the same sizes is to stop writing and do “optimize” all the tables.