Why is MySQL is Slow when Handling Millions of Rows

Hi everyone,

I’ve gone over this great forum and I’ve noticed several posts talking about bad MySQL performance when handling what users call “large databases” (= a few million rows and up).

This day and age, a few million rows of raw data is not considered large or even big. Especially since the web boom where MySQL is widely used in the belly of web applications recording user behavior. So why is MySQL so slow in returning queries that summarize/group information and/or join data from different tables?

If your first thought is because MySQL is open source, you would be mistaken. Commercial databases like SQL Server or Oracle don’t perform much better under these conditions. So, what is the real answer?

The answer lies in the fact that MySQL, like SQL Server and Oracle, is an RDBMS (relational database management system) that was designed for optimized disk performance when inserting/updating records. So while MySQL performs well as a transaction processing system, it works horrible when trying to query it for reporting/analytic purposes.
t.
Fortunately, the 21st century offers us an alternative. There are databases out there that do not suffer from this problem. Most of them use a technique called “Vertical Fragmentation” that is a technical term for saving data as separate fields instead of as a bulk of rows. These databases can handle TONS of data (hundreds of millions of rows) and desktop-class hardware with query responses measured in seconds.

These databases do not replace MySQL but rather complement it. Common architectures using this type of technology replicate the data stored in the MySQL operational database just for reporting/analytics purposes.

Some examples of databases like this:

  • ElastiCube by SiSense

  • Vertica by Vertica

Good luck

Actually, the more common term is not “vertical fragmentation” but “column-oriented storage.” And such databases are not general-purpose, but very special-purpose.

H’ya

Vertical Fragmentation is more than simple column-oriented storage. It has to do with how buffer managers handle queries as well, not just the actual CDBMS.

Anyway, you’re right that these databases are not “general purpose”, but that is true for all databases (including MySQL). Some databases are geared towards transactional processing, some towards reporting and analytics, etc.

It actually impossible to have a single storage mechanism optimized for both. At least, impossible at this point in time.

Thanks