Snapshot exporting in Percona MySQL

I have a question about multithreaded reading from MySQL. We need to read large tables in parallel from a MySQL database. To do this, we want to create several sessions to the database cluster and read the tables in parts, based on the primary key.

For example, we could have two threads:

# Stream 1
SELECT * FROM table WHERE key > 0 AND key <= K1

# Stream 2:
SELECT * FROM table WHERE key > K1 AND key <= K2

# Stream N:
SELECT * FROM table WHERE key > KN-1 AND key <= KN

The question is whether different reading transactions will be able to see the same state of the table at the same time. In PostgreSQL, there is a snapshot export procedure called pg_export_snapshot. Different transactions may specify the snapshot they want to read.

I couldn’t find a similar feature in vanilla MySQL, but maybe this (or something similar) is possible in Percona?

Hi Vitaly_Isaev,
It will depend on multiple factors, the most important is if you are using a transactional engine like InnoDB for your table. On vainilla MySQL, the simple options is to execute a FLUSH TABLE WITH READ LOCK, establish the connection for each thread and send a START TRANSACTION WITH CONSISTENT SNAPSHOT and release the FTWRL with UNLOCK TABLES. If it is just a single table you can lock the table. On Percona Server, if you have enabled GTID, you could avoid FTWRL (as we know that it is an expensive operation) and use the status variable binlog_snapshot_gtid_executed after sending a STWCS and compare it with all the threads.
Take into account that this options are implemented by MyDumper which also split the tables as you are trying to do.

Hi David, thanks for your response. I managed to find the code you mentioned, but could you please provide a little bit more details about thread interaction. Did I understand it correctly that:

On the first thread I just get the GTID:

START TRANSACTION WITH CONSISTENT SNAPSHOT;
SHOW VARIABLES LIKE "binlog_snapshot_gtid_executed"; -- preserve it for the other threads

Therefore on the second (and all the other) threads I can:

START TRANSACTION WITH CONSISTENT SNAPSHOT;
SHOW VARIABLES LIKE "binlog_snapshot_gtid_executed";
-- compare this value with the value obtained from the first thread, if they are matched, continue reading:
SELECT * FROM table WHERE pk > 0 AND pk < K1;
COMMIT;

If they doesn’t match, I retry creating transaction until they are matched.

Then, when all the reads are finished, I can release the first coordinating thread:

COMMIT;

Am I getting this right? If so, are there guarantees that binlog_snapshot_gtid_executed will eventually match in all the sessions I’ve opened without explicit table locking? Thank you.

Hi @Vitaly_Isaev ,
Just to be clear, you will be using Percona Server with GTID enable and binlog_snapshot_gtid_executed will be the mechanism to sync all the threads. With this solution, there is no need to have a “coordinating thread”, all the threads can work independently but they need to be on sync.
I think that you got the idea but there is no need of coordinator. Be careful during binlog_snapshot_gtid_executed comparison, as you will have multiple threads, you will need a mutex to avoid inconsistent reads and overwrites.

Be careful during binlog_snapshot_gtid_executed comparison, as you will have multiple threads, you will need a mutex to avoid inconsistent reads and overwrites.

Thanks again! Unfortunately, I still don’t understand which part of the code needs to be protected by a mutex. Could you please clarify?