I wanted to ask community for thoughtful answer to our problem. We have to store some data about unique users:
Each user can have multiple things that they wear on their avatar, our calculation that average will be around 10 unique items. We have 10,000,000 (10 million) uses and adding on average 100k per day. Existing users (not new ones that added everyday) also change their avatars by adding and removing items. So read, write AND in first scenario below, DELETE, will happening quite a lot, with read being slightly higher.
We are debating on database table schema and have two suggestions from our team members.
One:
±--------±-----------+ | user_id | item_id | ±--------±-----------+ | 1 | 20021 | ±--------±-----------+ | 1 | 30312 | ±--------±-----------+ | 1 | …n | ±--------±-----------+ | 2 | 301 | ±--------±-----------+ | 2… | …n | ±--------±-----------+
and so on (10M users * ~10 rows ~= 100 million rows)
Second:
±--------±---------------------+ | user_id | item_ids | ±--------±---------------------+ | 1 | 20021,30312,301,…n | ±--------±---------------------+ | 2 | 301,300341,1,4441,…n | ±--------±---------------------+
so in second case 1 row for each unique user id.
(10M users * 1 row = 10 million rows)
Few details about data selection:
We always going to retrieve all avatar items for user and we always going to save
all avatar items for user. In first scenario there will be possible DELETE because users might remove certain items from their avatar and add new, in second case it’s always going to a “blind” UPDATE no matter what was removed or added.
we DO NOT care/or ever will have to find/search for user’s that have particular item_id associated with them, so item_id/item_ids column is simply for the storage.
What is the best implementation in terms of raw speed? Maintenance?
On application level we have to have ability to use master-slave1-slaveN AND sharded database, so it should be good for both worlds.