Multiple databases vs. multiple tables

Hi!

I often see on this [exellent!] blog recommendations to follow multiple-table design (horizontal clustering in fact) when working with large data sets. In this way, for example, mail-boxes for different users are kept in several tables: inbox_001, inbox_002, sent_01, sent_02, etc.

The question is: why not use multiple databases for the same purpose?

Imagine maildb01.inbox, maildb01.sent, maildb02.inbox, maildb02.sent, etc.

Thus, we can have clustered data (spanning multiple tables) more logically structured.

Using this approach you can store the data, which is not meant to be clustered in, say, ‘main’ db, and the clustered data in multiple dbs.

As I know, at least in MySql tables and databases are pretty the same in technical terms, i.e. you can issue cross-database queries without any additional concern.

When you need to work with a particular data set you can just issue ‘use db’ before the real query. This may be especially useful if you need to implement clustering for production system - you don’t have to rewrite much code to substitute table names (i.e., inbox_01) in your queries. But with a price of an additional query roundtrip (you can issue ‘use db1; select col from table;’ however).

It is also must be easier to manage: just ‘drop database’ instead of multiple ‘drop tables’.

I can’t see any major ‘cons’ here, except that this thing may not wotk with other RDBMSes.

Am I missing anything?

[B]pastk wrote on Mon, 22 January 2007 02:07[/B]

Imagine maildb01.inbox, maildb01.sent, maildb02.inbox, maildb02.sent, etc.

Thus, we can have clustered data (spanning multiple tables) more logically structured. […]

Am I missing anything?

You’re right. MySQL doesn’t care about joining against tables in different databases, and in technical terms you are creating roughly the same amount of files on the filesystem, but organized differently.

You may even pick up a few advantages, since many filesystems start to get sluggish some point after 1000+ files in the same directory.

The only downside I’m aware of is that maintenance can be harder, and that you need to figure out a good naming convention for your databases.

I am a newby to MySQL and want to know if I were to create multiple tables instead of 1 giant table, how would I perform a search? I know how to perform a search on a single or multiple tables but not how to search on an unknown amount of tables.

Lets say I create a users table for every 10,000 users…
users_01
users_02
users_03
etc.

Since there is no way I can hard code this because it is dynamic, how would I perform a search for users?

Thank you,
kappytown

I am newby to mysql, i have few confusions and problems.
Here is the problem.

Suppose i have 100 millions records in a table having fields
id | username | categoryid

And about 10 categories, in each category we have about 10 million records. (means if a category having id=5, then there are about 10 million records in that table having categoryid = 5 )

And i have to select users having categoryid = 5.

SELECT * FROM table WHERE categoryid = ‘5’

is this a good method?
Or having multiples tables for each category can increase the performance?
If by creating multiple tables, i can increase performance then how many tables i can create with almost millions of records?

[B]kumaranoop83 wrote on Sat, 14 July 2007 13:07[/B]
I am Suppose i have 100 millions records in a table having fields [B]id | username | categoryid[/B]

Youd typically denormalise this to 3NF. Which means a seperate user table with userid, username

Then store simply the userid instead of the username
Also question whether you really need an ID field on this table, an alternative is to simply have a composite primary key (and index) on (userid, categoryid), which should help things tick along much more nicely