This forum looks like a great resource - thanks for creating it.
I have a challenge that no doubt has been faced by others before. I have developed a rather complex visitor/event/click tracking script that logs to multiple tables. This is not just simple stat tracking – so I must normalize the database and have each table related by primary key IDs, in many-to-one and many-to-many relationships … hence lots of JOINS.
The problem is, the tables over time have gotten rather large: up to 1.4 GB (22 million rows) and increasing relatively quickly right now. When I try to view some stats, my connection with the server just gives up or takes forever and chews up a lot of CPU processing power. I have INDEXED everything I can. Buffers have been increased. I’ve used EXPLAIN and even queries that are fully indexed take too long or don’t run at all if it involves a huge table. This is on a dedicated server with 2 GB of RAM and Intel Celeron 1.7GHz (not fast, I know).
Summary tables are an option for some stat reports but not for others.
So it appears that this is a “big table” problem. Although from what I’ve seen here, a 1GB/12 mil row table shouldn’t be any problem for mysql? However, I can’t just let the tables grow forever…
I see two solutions, aside from getting better hardware:
- Try to split all the tables up by date, ie, start logging to a new table when it gets to a certain size.
or
- Better optimize my queries somehow.
The problem with #1 (splitting tables), is that I’ve got primary key ID’s starting in row #1 (date of first log) that may need to be referenced today by one or more other tables. (One reason why archiving isn’t a solution). Assuming I can figure that out, there’s the problem of SELECT having to span multiple tables if they contain data needed for the date range requested. UNION may work here but again, we’re talking multiple split tables being UNIONed, joined and selected from.
Regarding #2 (better queries), would it be better to try to get a subset of each huge table first? Like by date range … and create some temp tables, then JOIN those? Or is that what mysql does anyway if part of the SELECT includes a WHERE datey?
Hope this makes some sense.
Any ideas?