One Very Large Table vs Many Little Tables

Hi everyone!

Sorry if this sounds like a repeat question, I’ve searched the forums and found some threads but nothing is quite what I wanted to ask. This is pretty important to me, so I just want to make sure I get this right )

I’ll Illustrate my question with an example:

Let’s say you have 100 million users registered on your site that can create “records.” By records I mean write some text, pick a date between 1 A.D to 2010 A.D, and save it. The problem is I need to be able to find records based on who created them, as well as by the time they were created. For example:

  1. List all records created by user_875
  2. List all records that are within 2 days of April 24th, 1983.

The most important thing is that the SELECTS are extremely fast. So I don’t think making one table for the records will work. Let’s say there are 100 SELECT queries (to find records) per second, and 5 INSERT queries (to create new records) per second.


I was thinking:

  • Create 1 Table for the 100 million users which stores user information as well as login/password info with an index on login/password.

  • Create ~104,000 Tables for the records, where each table stores
    the records for a certain week. For example table_for_week_5_in_1564. This way to find records that are within a certain date range, I could just grab those tables which should be much faster than querying one gigantic table?

However I’m not sure what to do about quickly finding all the records posted by a certain user? I guess I could create one table for each user that lists the record time, and number (as a reference to one of the weekly record tables) so I could find them that way?

Sorry if this is a dumb question, I’m rather new at this. I have also read about table partitioning so I guess I could put all the records into one large table and partition it by week?

Any help is greatly appreciated, thanks in advance!

General rule:
Don’t make it more complicated than it already is.

You should stick to one table!

Reasons:
1.
100 selects/5 inserts per second isn’t actually that much.

As you can see yourself, if you start splitting data in separate tables it will get very complicated if you want to select anything else apart from the way that you split it.
As you said if you split it on week, how are you going to get all rows by one user during the entire time?
The answer is a very large ugly UNION.

Having 100,000 tables with 100 rows in each compared to 1 table with 10,000,000 rows is not faster in that sense.
It is mainly the amount of data that degrades performance. Not the size of the tables.
And most filesystems for example starts to get unhappy when you create that many files in a directory. And since a table in MySQL is minimum 1 (InnoDB) and maximum 3 files (MyISAM) you will get a lot of files. And finding the right one and opening it will put a strain on the OS instead of the MySQL process.
So you are essentially shifting the burden from one place to another.

As for partitioning, granted there are times when it can be useful. But when you are just starting out you should instead focus on learning the intricate workings of indexes and how to optimize queries with the use of EXPLAIN.
That is much more important in obtaining performance out of an application than anything else.

Thank you very much for the reply!! That helped tremendously. I will take your advice and first learn everything I can about indexing and query optimization and go from there.