I’ve gone over this great forum and I’ve noticed several posts talking about bad MySQL performance when handling what users call “large databases” (= a few million rows and up).
This day and age, a few million rows of raw data is not considered large or even big. Especially since the web boom where MySQL is widely used in the belly of web applications recording user behavior. So why is MySQL so slow in returning queries that summarize/group information and/or join data from different tables?
If your first thought is because MySQL is open source, you would be mistaken. Commercial databases like SQL Server or Oracle don’t perform much better under these conditions. So, what is the real answer?
The answer lies in the fact that MySQL, like SQL Server and Oracle, is an RDBMS (relational database management system) that was designed for optimized disk performance when inserting/updating records. So while MySQL performs well as a transaction processing system, it works horrible when trying to query it for reporting/analytic purposes.
Fortunately, the 21st century offers us an alternative. There are databases out there that do not suffer from this problem. Most of them use a technique called “Vertical Fragmentation” that is a technical term for saving data as separate fields instead of as a bulk of rows. These databases can handle TONS of data (hundreds of millions of rows) and desktop-class hardware with query responses measured in seconds.
These databases do not replace MySQL but rather complement it. Common architectures using this type of technology replicate the data stored in the MySQL operational database just for reporting/analytics purposes.
Some examples of databases like this:
ElastiCube by SiSense
Vertica by Vertica