Break huge table into thousands of smaller ones?

Hey,

Quickie for you, in an app, picture a HUGE table that contains parameter values for subscribers, where each subscriber might have 3,000 parameters associated to them, so we’d have a table like

CREATE TABLE my_large_table (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
param_id BIGINT NOT NULL,
pvalue VARCHAR(64000) DEFAULT NULL,
ptype_id BIGINT NOT NULL DEFAULT ‘unknown’,
paccess VARCHAR(64000) DEFAULT NULL,
subscriberid BIGINT NOT NULL,
INDEX idx_subscriberid ( subscriberid )
)
AUTO_INCREMENT = 1
ENGINE = MyISAM
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_bin;

We could have eg. 100,000 - 1,000,000,000 subscribers

Now … this is MyISAM and as it is, does not suit do to if you have 30,000 subscribers logging in and updating parameters, MyISAM will lock the table on every update/insert blocking others. Everything crawls

Ok … there are 2 solutions, I know the one I’d back, but due to some guy here in work just thinking the other solution is cool or something, we’re going with it… I’d just like to get peoples opinion

Solution 1: Change the table to InnoDB so enable row locking and get rid of the problem with multiple subscribers updating their parameters (not sure if it could be partitioned based on subscriberid, think I looked at this and was shot down due to it not being unique in the table)

Solution 2: Split the table up into thousands ( 100,000 - 1,000,000,000 ) of smaller tables, so it works out 1 per subscriber, it’s still MyISAM but table locking isn’t an issue anymore

IMHO it has to be 1,
Solution 2 isn’t very very very VERY VERY VERY bad, but if you want to do any queries that involve multiple subscribers, you’re screwed. Imagine having to find all subscribers with a particular parameter value! Still, it solves the immediate problem, its just very short sighted. Mysql will handle a database with millions of tables and if all you do is get 1 subscribers details, update them, read 1 etc. its fine

Is there a limit on the number of tables you can throw into a union? Also, the number of tables would be dynamic, a new subscriber comes along, its a new table… how can you write a query to do union an ever changing number of tables!

I’d be very interested in others opinions… I don’t want to colour peoples, so please look at this with an open mind :wink:

Bal

Whoever suggested solution #2 should benchmark how the intended filesystem copes with a directory with 100k files in it.

I suggest getting a copy of our book (second edition of High Performance MySQL). These kinds of questions are answered thoroughly there. There’s no need to reinvent the wheel and rediscover the acute misery of a bad design that is hard to change by the time you discover how bad it is.

Well put, listen to the Baron! :wink:

Shoot down the other guy immediately!
Databases are not built to handle 10 rows in 1,000,000 tables. They are built to handle 1,000,000 rows in 10 tables.

Splitting related data into so many tables will just create a nightmare to work with and you will get bad performance.

In this particular case, there are 50,000 subscribers with about 3,000 paramater values each… but it should be able to scale into the millions of subscribers, the table file is about 160gb with about 150million rows

The problem was, it was a MyISAM table, so locking was a serious issue, one subscriber updates data and the entire table is locked.

Proper layout would have been a partitioned innodb (with ndbcluster option) table… problem solved.

But the guys here read something about some big names like google etc. splitting things up into millions of tables (possibly something like geographical mapping s/w with x/y tables) so I was outvoted :confused:

I’m still just pissed about the whole thing, one guy got it into his head about splitting them and wasn’t really open to listening to anything else. A small bit of knowledge is a very dangerous thing :confused:

Now … the issue of running reports on particular parameters has cropped up … I’ve done the ‘I told you so’ … but was responded with ‘this is what we have and where we are now (ie. split tables) … so find a way to do a fast query on parameter values across all subscribers’. It’s horrible … read all required parameters from all tables and dump into a temp table, do a sql with thousands of unions… the options are all disgusting

I feel I’ve pretty decent MySQL knowledge and am confident enough with it, but not having the power to implement things as I see it frustrating… its a typical ‘youre a code monkey, get back in your box’

Thanks for the replies, I needed a little reassurance I wasn’t insane )