Database design for local information specific site

Hi,

I am designing the database for a highly content driven, dynamic, location based site.

Whenever a user visits the site, he sees most of the data specific only to his city. There is very little generalized data to be shown.

For this, I have 2 designs in mind :-
(1) Have a single database encompassing all cities
In this case, for every page, I will have to do extra JOINS to pull out data corresponding to a particular city and the tables will become very large.
But then, the database will not have redundancy and will be optimized.

(2) Have 1 common database for tables storing GENERAL INFORMATION and user specific information AND have 1 database PER CITY (city specific databases having the same design). i.e. :-
common-database
city1-database
city2-database
city3-database…and so on
In this case, the number of JOINS being performed are reduced and wherever they occur, they may be between tables belonging to different databases (is this good ?). But this design introduces redundancy and difficulty in feature scaling.

Please give your views on which of the above 2 designs would be better for this city-specific site. A good reasoning will be very appreciated. [I am using CakePHP as the framework for coding…if this may help]

Thanks.

Hi,

I am confused by your question.
Do you talk about databases or tables?

For MySQL it does not make a difference if a table is stored within the same Database or if it is in another (in the same dbms).

My second question is, do you want to split your data vertically (Other columns in a different table), or horizontally (other rows in a different table with the same column structure).

If the number of rows to store is very high, and you can identify clusters (the cities in your case), so that your access pattern is always (mostly) within such a cluster, you can benefit by splitting the data, by these clusters, into different tables.
You cant store the tables in different DBs, or just give them a prefix in there name. (This would be horizontally splitting)

You would do vertically splitting, if you do have many columns but typically need only few of them. Than you can split your data into a table with often accessed columns and into one with the other columns.
This is done to reduce the cache space used. Now you will not fill your cache with the rarely accessed columns data each time you access the often accessed data.

And a general word to redundancy. When your normalize your database ([URL]http://en.wikipedia.org/wiki/Database_normalization[/URL]), one of your goals is to reduce redundancy.
But after you have finished the normalization process, you will need to tune your design for usability and performance. And at this time, you may add some redundancy again. Each time you will have to make a decision, if it is worth it. And there are enough cases where it does make sense to have redundant data.

cheers

artur