Issue with 999999 ID limit in pt-archiver

jimmy0x52 · September 10, 2014, 4:36pm

I posted this in the SO DBA forum and it was recommended I try here:

[url]http://dba.stackexchange.com/questions/76256/percona-toolkit-pt-archiver-how-to-get-rid-of-999999-id-limit[/url]

Basically, I have a 1.9 billion row DB that I need to do an initial large archive of and then I’ll keep a rolling 3-month window of data using pt-archiver. I’ve written a script to run pt-archiver 1 day at a time using the code referenced in that link.

I’ve noticed using --dry-run that it limits results to id < 999999 and with a table this large I far exceed that limit. How do I modify this limit? I see no option in the pt-archiver documentation that explains how I can remove it. When I try running my script I get data for some dates and then it just falls off when I know I have data in the DB. If I run the --dry-run queries without that ID limitation they return data.

Thanks in advance for your assistance.

scott.nemes · September 10, 2014, 6:01pm

Someone from Percona who has worked on this project might know more details, but below is my $.02 as someone who has used the tool quite a bit at least.

pt-archiver chunks the rows, so I believe what you are seeing with the id < 999999 where clause is the first chunk printed by dry-run, which is then split up further by your 1000 limit. To test that theory, just run your pt-archiver command with --no-delete and then kill it after you are comfortable with what it is doing, as you can watch the SELECT queries it is generating until you see the id < 999999 where clause change to something larger. The --no-delete option will “archive” the rows, but not delete them from the source so you can then re-run pt-archiver later once you are ready to start deleting the records.

jimmy0x52 · September 10, 2014, 6:31pm

That’s what I was saying - I ran it and it failed to pick up data after a certain date. The data is chronological and it is odd that it just plain stopped getting data after that date. The only explanation I could think of was the ID being > 999999.

How does it know how high to chunk those ranges? I assume from the next PK?

I’ll do a little more debugging tomorrow. Thanks, Scott.

scott.nemes · September 10, 2014, 7:20pm

Not sure on the chunking specifics, so hopefully an engineer from Percona can chime in on that.

So when you say it stops getting data, does that mean if you run a SELECT yourself where date_id=$dateid, it pulls back more data that pt-archiver copied doing the same thing?

Do you have any logging for this as well? Could be worth running pt-archiver in a screen session with --progress enabled so you can see what it’s doing and where it stops.

Basically I’d remove the script all together (you may have already done that), and just test a single pt-archiver statement and verify the results that way.

The other variable is the ascend-first option, as I’ve never used that. There could be a bug there that might limit it, so that is entirely possible.

jimmy0x52 · September 11, 2014, 7:54am

I figured it out. I imported 1M rows from my real DB into a test DB. I had the script pointing at the test DB on accident so pt-archiver was just looking for any ID in the system in the test DB up to a maximum. When I point it at the real DB I get a much larger number.

Thanks for the help, Scott.

scott.nemes · September 11, 2014, 9:31am

Ah, glad you figured it out! The simple answers tend to be the hardest to find, but great when you finally find them and it’s an easy fix anyway.

I’d be interested to hear how your archive initiative turns out overall once you get the table cut down in size. I deal with 2TB+ tables myself (and around 15k MySQL instances), and it’s rare to find other real world examples of people dealing with truly large data sizes / volumes in MySQL. =)

Vu_Nguyen · July 24, 2024, 5:40pm

Hi Jimmy,
Do you mind sharing how long did it take for you to finish archiving 1.8 billion rows using pt-archiver? Did you run pt-archiver every day until it finished?
Thank you.

jimmy0x52 · July 24, 2024, 5:54pm

Wow, it’s been 10 years! I can tell you that it did eventually finish, but I have no clue now how long it took. Good luck!

Topic		Replies	Views
pt-archiver deleting all records in the where Percona Toolkit	1	1780	September 16, 2021
Pt-archiver Tuning and performance Percona Toolkit mysql	5	2929	June 15, 2022
Help using pt-archiver for archiving DB Percona Toolkit	0	717	September 15, 2014
Pt-archiver leads to full table scan (and therefore lock on the source MyISAM table) Percona Toolkit	2	1420	July 7, 2021
pt-archiver lose last one roow Percona Toolkit	2	855	December 4, 2017

Issue with 999999 ID limit in pt-archiver

Related topics