I am designing a quite simple web app, but I am trying to design it so that it will scale well.
Basically I have an item table and a user table. A user may have many items and each item has exactly one user.
Every time an item is retrieved the corrisponding user must also be found. Currently, when listing items, the users username is retrieved using an inner join on the user_id field that is in both tables.
Reading some stuff on high-performance mysql and website design, it suggests de-normalization is something you should consider.
So I could de-normalise this schema to insert a ‘cached copy’ of the user’s username in the item table.
This will save me a join in the most common query in the application. This is good.
However, I will face the issue of keeping the item.username field upto date. So, if a user changes their username, all their items will need to be updated. Given mysql’s table locking rules (and the fact that I want the app to scale well in a replicated setup), is this a good idea?
Interesting idea. So istead of an inner join, might I be better to pull out the user_id’s from the selected items and do a second query using the IN operator on the list of user_id’s?
Although, I guess a list of user_id’s and the IN operator might not generate cache hits so much?
So, de-normalised is a bit faster, but its not really going to make THAT much difference. Method 3 is slower than 2, suggesting, once again, its best to throw it all at mysql and let it worry about optimisation…
Database setup: There are 100,000 items, 10,000 users. All records randomly generated with evenly distributed random values. Times are in seconds to run the query 100 times. Query cache disabled (I hope).
Your benchmarks however oversimplify things. Normally in Web application login step is separated - so you read and check user access before you start doing other things, this means you should already have user_id available before you start the query.
But anyway in your application it may be different.