Optimal database structure


I want to understand some special inner properties of mysql.
So, I have task:
I need to create large database (size of it would be about 50 millions of rows).

I have one map-table and 5 tables-satellites.
Map table: (here is 50 mlns)
first_table_id (1000 distinct values)
second_table_id (100 distinct values)
third_table_id (1.000.000 d.v)
fourth_table_id (300.00 d.v.)
fifth_table_id (1000 d.v.)

Stuctures of satellites:
first: id + many char(255) fields - the characteristics of this record type
second: id + blob (data)
third: id + char (255)
fourth: id + char(255)
(!) fifth: id + 13 set columns, each of them has 2-15 values.

So I have a lot of join queries such as joining map-table with satellites (especially third, fourth and fifth) and map-table - map-table joins (on first and second columns) - I need in that joins to have a table whith 2-3 fourth columns and 2-3 fifth columns.
I know exactly, that I wouldn’t have a join to have a table larger than 3 fourth and 3 thrid columns.
But also I have queries that uses just map-table whith fourth table, map-table with third table, with fifth table and all combinations (3,4,5,3-4,3-5,4-5,3-4-5)

4 questions:

  1. Would it be better, if I alter map table to have 3 fourth and 3 fifth columns?
    (I would have MYI and MYD files much more large, so maybe I would have problems with HDD I/O? Now map-table is about 2.5 Gb)

  2. Should I join all set-columns in fifth table in one? - I would have one index for all columns, so maybe searching in one big set would be faster?

  3. Set is just binary mask, so it is very small - maybe I should place one big set column in map-table? (if answer on the first question is “yes” then my answer is should I place 3 big set columns ) )
    Would I have economy on join with fifth table using this trick?

  4. if answer on the first question is “yes”: how do I need to create indexes to search records fast? Full combinations of probably search fields:
    3 - 4
    3 - 5
    4 - 5
    3 - 3_1
    3 - 4_1
    3 - 5_1
    3 - 3_2
    3 - 4_2
    3 - 5_2
    and so on - all a-b and a-b-c combinations.

In that case the best thing is B-tree of Indexes )))
So MySQL doesn’t allow them…

Thank you very much for your answers, it’s really very important to me. I’m happy that I have found this site because nobody in the internet knows answer, and documentation doesn’t have answers too on these questions…

There are very many variables to consider so best way in this case is to run your own benchmarks and see which structure is best for your queries.

I do not know memory availability for example - does these 2.5GB table fit in memory or will it need to be read from the disk ? This may make a difference.

In general sequential IO is pretty fast so it is most important to avoid random IO so if you join tables it is best if they fit in memory. In many cases you can get planty extra speed by pre-loading them in heap table.

I would also try denormalizing and see if it helps.

Thank you very much Peter for your answer!

Now we have server with 4Gb RAM available.
We thought, that we’ll put everything in heap table (or better, physically put MYD and MYI tables on RAM-disk (disk in memory)) and that we’ll have much better performance.

But after configuring FreeBSD 5.4 we have found, that this OS doesn’t allow to create RAM-disk more than 512Mb and doesn’t allow to allocate more than 2.5 Gb memory for one process.

So now we allocated 2.5 Gb memory for mysqld.
But map-table is actually about 2.8 Gb, so we don’t have opportunity to put it in memory.

Some other question: how can I calculate mysql memory parameters to have best performance?

I have read that they vary from task to task, but on the other hand I know that some parameters may be too big (for example, read_buffer_size = 512M is too large, for all machines - am I right?).

Is there one formula to calculate (approximatly) theese parameters having key_buffer, for example 1Gb ?
(I mean something like sort_buffer_size=1/4 key_buffer; read_buffer_size = 1/2 key_buffer and so on)

I rebuilt table using the principle of denormalisation.
So as I have found, it gave much worse results (

I think I have found an answer on my question.

If I use merge table as map table, and split 50-mln tablo into 10 5-mln, every 5 mln table may be placed on the RAM-disk.

What do you think - will I have increasing of performance?