associating user_id with multiple things for 10MM + table

I wanted to ask community for thoughtful answer to our problem. We have to store some data about unique users:

Each user can have multiple things that they wear on their avatar, our calculation that average will be around 10 unique items. We have 10,000,000 (10 million) uses and adding on average 100k per day. Existing users (not new ones that added everyday) also change their avatars by adding and removing items. So read, write AND in first scenario below, DELETE, will happening quite a lot, with read being slightly higher.

We are debating on database table schema and have two suggestions from our team members.

One:

±--------±-----------+ | user_id | item_id | ±--------±-----------+ | 1 | 20021 | ±--------±-----------+ | 1 | 30312 | ±--------±-----------+ | 1 | …n | ±--------±-----------+ | 2 | 301 | ±--------±-----------+ | 2… | …n | ±--------±-----------+

and so on (10M users * ~10 rows ~= 100 million rows)

Second:

±--------±---------------------+ | user_id | item_ids | ±--------±---------------------+ | 1 | 20021,30312,301,…n | ±--------±---------------------+ | 2 | 301,300341,1,4441,…n | ±--------±---------------------+

so in second case 1 row for each unique user id.
(10M users * 1 row = 10 million rows)

Few details about data selection:
We always going to retrieve all avatar items for user and we always going to save
all avatar items for user. In first scenario there will be possible DELETE because users might remove certain items from their avatar and add new, in second case it’s always going to a “blind” UPDATE no matter what was removed or added.

we DO NOT care/or ever will have to find/search for user’s that have particular item_id associated with them, so item_id/item_ids column is simply for the storage.

What is the best implementation in terms of raw speed? Maintenance?

On application level we have to have ability to use master-slave1-slaveN AND sharded database, so it should be good for both worlds.

Method1

  • easily joinable
  • relatively high overhead per row from storage engine (InnoDB adds 19 bytes per row to the 8 bytes you need for storing two integers)

Method2

  • not possible to store as integer; varchar needed
  • not easy to find users with particular item or to remove certain items, but “we DO NOT care/or ever will have to find/search for user’s that have particular item_id associated with them”

I would go with method 2.

You might also consider using a key-value store like memcached or Redis instead of MySQL. It depends on a lot of things, such as whether it is OK to store any updates in the session and queue these changes rather than putting them in permanent storage right away, to reduce the work caused by the frequent updates you mentioned.