processing large result sets...

Hello,

I have a large table with about half a million (potentially a few million) rows - i need to make a snapshot of the data at a specific moment and process it for export from the database (in an xml file, but that’s not that important)…

I have some ideas on how to process them now but each has its drawbacks (i’m using php to process the data, if that is of any importance)…

  • one could just make a ‘select *’ on this table and then process it one row at a time at the application level - the problem - the data might grow so large that the result set does not fit the available RAM…

  • one could ‘select id from…’ and then process it one id at a time, fetching rows and the processing them… but that would require me to first make a copy of the table, so it doesn’t change during the whole operation… and, theoretically, it doesn’t stop the script from taking too much memory if there are enough rows (not a few million maybe but it’s certainly possible)…

  • I was also thinking in a direction of processing it part-by-part using ‘select … limit …’ which might (i’m not sure) allow me to process arbitrarily large result sets but I don’t think it’s a wise idea from performance point of view… of course, I would still need a copy first in this case…

Any other ideas? Maybe some obvious way I’m somehow missing? Some standard way of processing large amounts of data from a MySQL database?

Thanks in advance…

How often do you need to do such database exports? If not very often, then if MySQL is on LVM partition, you can do LVM snapshot, mount snapshot, copy data to another server, stat MySQL with data directory from this snapshot, and export it to XML file.

[B]debug wrote on Mon, 28 April 2008 13:47[/B]
How often do you need to do such database exports? If not very often, then if MySQL is on LVM partition, you can do LVM snapshot, mount snapshot, copy data to another server, stat MySQL with data directory from this snapshot, and export it to XML file.

Thanks for your answer ), the exports will need to be done once or twice a day…

As for the lvm option, I’d have to think about that - the database is now on a regular filesystem so that would require some shuffling about… But this solves only the “snapshot” part of the problem - the other part of my question still remains unanswered - What is the best way to process and export a large result set (possibly exceeding available memory) from a database to an xml file (or some other format)?