I’m running Percona XtraDB Cluster 5.5 with all InnoDB tables.
I need to replicate data from a couple of databases with several hundred table to a separate server (that is not part of the cluster), but only some of the data (related to a particular “user”) along with non-user-specific data.
I don’t think I can use normal master/slave replication for this sort of thing.
I could write a script to go through the different tables and extract the relevant data, but I would like to do this incrementally once per day, so is there a way to find out what rows were modified in the last 24 hours in order to copy the relevant ones?
I could perhaps add triggers to every table, but wouldn’t that cause problems? e.g. performance?
Yeah this sounds like it will require some sort of custom solution, as it seems more like an ETL task than replication. You could use an after insert trigger on each table to insert into another table and then have your slave only replicate those new tables. That would be the most “automated” way, but having all the triggers and replication filters is not ideal. Triggers do have performance implications, as they require extra processing (and in your case extra inserts), but depending on your work load and hardware, it may not be an issue. Another thing to consider there is that tools like pt-online-schema-change require their own triggers, so if you put triggers on all your base tables, you may hurt yourself in the future when trying to run such tools that want to create their own triggers.
Depending on what you are actually trying to do, some sort of ETL process to extract the data you want and load it into your slave daily might be a better solution. This method would be more straight forward, and would not further complicate your masters schema. To find out what rows were modified, you would need a “last updated” type column. Short of that, you would have no way of knowing what was updated. For new records, you could potentially store the last primary key you copied over, and then start from there on the next run, but that only would work for new records and not old records that may have been updated.