Any reason I shouldn't use one great big table?

Hi there,

This is my first post here so Hello World :slight_smile:

I’m aware the whole point of relational databases is to have ‘normalized’ data so as to prevent insert/delete/update anomalies but I’ve got a job where the data suggests one great big flat file might be faster & easier, it’s definitely simpler for me right now. I would appreciate any advice you could offer…

Basically it’s a listings site with LOAD of wanted and offered items, each with their own (boolean) columns. Users can offer or request any combination of items and the system will find all the appropriate matches. On top of that each listing has some other associated data e.g. description, posters e-mail, time of posting etc but similar to gumtree.com in that there are no persistent accounts, each item is wholly independent of all the others.

Now, while I can see how I might begin to normalize this data I don’t know if it’s worth the extra trouble. The one idea I had was to maybe split the booleans and listing data into 2 seperate tables so the smaller boolean table (which is the basis of all non-keyword searches and used every query) might easier fit into memory/cache.

Would the above a) Work in the way I envisage? and b) Yield me significantly better results than the big flat file version?

There will be about 60 boolean fields & 5 or 6 Kb of other stuff per record say in another 10 fields. I don’t anticipate more than 50,000 records on the production site but hopefully there will be a lot of queries every day (hopefully upwards of 100,000 eventually) I am using InnoDb tables.

Yours sincerely,

Roger Heathcote.

As usual it is almost impossible to answer a question like yours since it depends so much on the distribution of your data and your applications query pattern.

But some general pointers can be that depending on how you intend to search on the boolean data you get more complex queries when trying to count/sort by appropriate matches if you don’t normalize your data.
But as you yourself has noted you can definitely sometimes live with that due to easier DB structure.

And your idea about splitting out the smaller boolean columns into a separate table is a very good one.

I for example very often suggest that a BLOB column should for example be broken out into it’s own table indexed on the same primary key as the primary table.

So I think a good compromise for you could very well be just two tables. One with all boolean records and one with the larger “payload” so that you get the benefits with boolean table in cache as you said.