The problem is I have a big table and have to delete a substantial part of data from it, say 60%.
It would be nice to do it the way pt-online-schema-change makes DDL changes, i.e. create a new table, and move all the records I want to be left in the table to it, then swap the tables.
The gains are:
- The data is moved in small chunks automatically, I don’t need to care about replication lag;
- Only 40% of the data is modified (rewritten to a new table) instead of 60%;
- All the unused disk space goes back to OS, the new table consumes disk space only for those 40% of the records, which were actually moved.
The solution could be to add an option that adds WHERE condition to every chunk of data moved to the new table. So every chunk would be limited by pk id (or any other column if --chunk-index is used) and this arbitrary where condition.
It might seem strange, that this use case of pt-online-schema-change does not suppose a schema change at all, but why not, if the algorithm is almost the same? =)