Difference between data space utilisation between the nodes of the PXC cluster

anjaneyarajus · November 12, 2024, 8:59am

we are using PXC with compression tables.

The cluster consists of three nodes, and writes always take place on node one.

We discovered a difference in data space utilisation between node 1 and the other two nodes in the cluster.
Nodes 2 and 3 use 6% more disk space than node 1.
is there a way we can reduce this difference between the nodes?
is this expected behavior?

matthewb · November 12, 2024, 1:24pm

Compare each table’s data, index, and free space, across all nodes:

SELECT table_schema, table_name, CONCAT(ROUND((data_free / 1024 / 1024),2),'MB') data_free
  FROM information_schema.tables
  WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    AND ENGINE LIKE 'InnoDB'

anjaneyarajus · November 14, 2024, 9:39am

I have checked on all the nodes, date_free for all tables is same

matthewb · November 14, 2024, 6:36pm

If all the data is the same, I would next use something like du -c /path/datadir to compare individual directories and files to find where the size differences are. Check man du and read up on --apparent-size as well.

anjaneyarajus · November 15, 2024, 9:00am

Hi Mathew,

below are sizes of ibd files from writer node which is node1 and non-writer node which is node2

Node1
5.4G events.ibd
9.3G psps.ibd
24G counts.ibd

Node2
6.1G events.ibd
13G psps.ibd
29G counts.ibd

There is difference size of tablespace between node1 and node2

matthewb · November 15, 2024, 7:50pm

Run this new query for those specific tables:

SELECT table_schema, table_name, 
CONCAT(ROUND((data_letngth / 1024 / 1024),2),'MB') dataMB, 
CONCAT(ROUND((index_length / 1024 / 1024),2),'MB') indexMB, 
CONCAT(ROUND((data_free / 1024 / 1024),2),'MB') data_free
  FROM information_schema.tables
  WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    AND table_name IN ('events', 'psps', 'counts');

Then look at things like inode counts of the files. If you want to really know, use this tool to count how many of each type of page are in each .ibd file.

anjaneyarajus · November 18, 2024, 6:25am

Hi Matthew,

Thank you for your update. We will run the script and inform you if we find anything.

anjaneyarajus · November 18, 2024, 10:29am

Hi Matthew,

I have ran the script for one of the table where we have observed difference in disk space utilisation

on node1,

INODE fseg_id=1, pages=1, frag=1, full=0, not_full=0, free=0
INODE fseg_id=2, pages=0, frag=0, full=0, not_full=0, free=0
INODE fseg_id=3, pages=7968, frag=32, full=120, not_full=4, free=0
INODE fseg_id=4, pages=4624224, frag=32, full=63903, not_full=8350, free=0
INODE fseg_id=5, pages=2272, frag=32, full=29, not_full=6, free=0
INODE fseg_id=6, pages=314848, frag=32, full=4293, not_full=626, free=0
INODE fseg_id=7, pages=224, frag=32, full=2, not_full=1, free=0
INODE fseg_id=8, pages=113056, frag=32, full=1536, not_full=230, free=0
INODE fseg_id=9, pages=2400, frag=32, full=32, not_full=5, free=0
INODE fseg_id=10, pages=377696, frag=32, full=5150, not_full=751, free=0
INODE fseg_id=11, pages=2400, frag=32, full=30, not_full=7, free=0
INODE fseg_id=12, pages=377376, frag=32, full=5147, not_full=749, free=0

on node2,

INODE fseg_id=1, pages=1, frag=1, full=0, not_full=0, free=0
INODE fseg_id=2, pages=0, frag=0, full=0, not_full=0, free=0
INODE fseg_id=3, pages=13920, frag=32, full=188, not_full=29, free=0
INODE fseg_id=4, pages=6830240, frag=32, full=91753, not_full=14969, free=0
INODE fseg_id=5, pages=2272, frag=32, full=29, not_full=6, free=0
INODE fseg_id=6, pages=314848, frag=32, full=4293, not_full=626, free=0
INODE fseg_id=7, pages=352, frag=32, full=4, not_full=1, free=0
INODE fseg_id=8, pages=157024, frag=32, full=2122, not_full=331, free=0
INODE fseg_id=9, pages=2399, frag=31, full=32, not_full=5, free=0
INODE fseg_id=10, pages=377760, frag=32, full=5151, not_full=751, free=0
INODE fseg_id=11, pages=2399, frag=31, full=31, not_full=6, free=0
INODE fseg_id=12, pages=377760, frag=32, full=5151, not_full=751, free=0

the difference that were observed was no of pages in inode and not_full is higher on node2 compared to node1.
since not_full, has one free page for each page, is the reason for differnce in disk space utilisation between node1 and node2,Not sure if my understanding is correct

yves.trudeau · November 18, 2024, 8:48pm

This is normal, nodes will have slightly different file size, it depends on many factors, the most common are mvcc, purging and change buffering. This is not an issue at all. The only way to get the same sizes is to stop writing and do “optimize” all the tables.

Topic		Replies	Views
Poor sysbench performance - 3 node pxc versus single node community version Percona XtraDB Cluster 5.x	1	820	April 24, 2017
Different table size on nodes Percona XtraDB Cluster 5.x	2	727	January 4, 2021
PXC Cluster recovery via Xtrabackup SST space issue Percona XtraDB Cluster 5.x	5	910	September 26, 2017
One node is running out of space after cluster crash Percona Operator for MySQL	13	111	February 28, 2025
Using Compressed row_format in Percona xtradb cluster Percona XtraDB Cluster 5.x percona	4	97	November 8, 2024

Difference between data space utilisation between the nodes of the PXC cluster

Related topics