Hello~
I have some problems on designing db schema and query for random user search based on simple search conditions.
User can execture search by selecting some conditions (sex, age, region and price) and search function will display random user meeting this condition.
I want to design scalable architecture for tremendous users.
So I am considering sharding, replication for ease of maintenance and cache efficiency.
If I want to shard this kind of table, which sharding approach is best?
My first idea is sharding search tables by the combination of the values of search condition. For example, I want to keep all the data of users who is male, living in newyork, age from 15~20 and price is in p1 range in the table named after s_data_m_ny_15~20_p1.
My idea is spliting search tables according to the actual user’s data. Global dictionary table will manage which search table in which shard meets specific search condition and route query to relevant search table.
If I use MyISAM engine, I can manage several Merge tables which manages tables with overlapping search condition (for example, table for just man is named after s_data_m.)
I perfer to use innodb. In case of innodb can I use MySQL partitioning for the same purpose of merge engine in above scenario?
The other reason why I keep data in the relevant search table is select random. I want to execute order by random() select for some search table. This may be slow if data size grows.
So in php random integer is generated and query just select random row the primary key of which is close to this random number.
But in this scenario of random select, how to deal with fragmentation? I will define auto-increment int column as primary key of each search table. If some user modifies his information, other search table meeting new information will manage that user’s data. So delete from old table and insert into new table is required and there are fragment for this record in old table. After some interval, there may be many holes regarding primary key integer value range.
If there are many fragments, the random select may not find row even if there are rows. How to handle this problem for random select?
If I split search tables into several shards, I should maintain redundant overlapping tables. Innodb partitioning or merge engine cannot be applied to tables in different shards.
The partitioning of search table like above scenario has difficulty in applying to physically different shards(servers).
Would someone be so kind as to let me know good insight for this kind of problem?
I think the implementation of my scenario will be too complicated.
For simplicity, old data or data of inactive users may not be maintained in search table.
My explanation on my problem may be unclear.
Thanks in advance for any good insightful opinion.