Hello~
I am not so experienced on fulltext search and fulltext search engine.
If my understanding is not so good, please forgive me.
Our site is social networking site and I am rebuilding all the database architecture by using partitioning data in multiple server clusters.
We are newcomer, however, we want really optimized and scalable architecture in mind.
I am also optimizing user search function related schema, query and server architecture.
I am terribly confused on the user search where user can select several conditions such as age range, current location, interested gender, social purpose and personal interests.
Multi column normal indexes can be built on the columns regarding age range, current location, interested gender, social purpose and etc. personal interests (music which I like, movie which I like, book which I like and etc) is actually represented as varchar field and inclued values by using comma as delimeter (music which I like: “beatles, madonna, 王菲”). So I consider personal interests related columns can be fulltext indexed.
When user selects some range on normal index and types some keywords in each personal interest, I don’t know the optimal and scalable way to commit the speed of user search within several hundred milli seconds.
Solutions which I seek should guarantee the search speed even if the data set(record number) in search table reaches to 1 billion records.
If you cannot understand well on the requirements of our user search, you can imagine the advanced search of facebook.
MySQL does not allow multiple indexes to be used concurrently in a single query.
One solution for our user search which I am thinking is :
- maintaining two types of table (one with normal index and the other with fulltext index), joining the subqueries using separate index on separate table can use 2 kinds of index(normal, fulltext) together.
This approach can save disk space and improves search speed because each type of table needs to keep required column data for search condition and to keep only required indexes.
Moreover search related tables are updated as batch job periodically. Search related tables are stored in different server cluster.
Can my approach be feasible solutions for our user search requirement even if the number of user reaches to 1 billion?
I cannot make sure on that.
we may need to partition again search related table into several tables? We can maintain search table with normal index into several partitioned tables by the range of several columns whose cardinality is small. For example if we name our search_table_with_normal_index as normal_search, we can partition the data of that table into normal_search_man_beijing_13-20, …, normal_search_woman_beijing_13-20, and so on. Moreover search with broader search conditions can be directed to the merged tables. We consider to use merge storage engine for merging.
This kind of partitioning can contribute the user search with only conditions on normal indexed columns.
But how to handle user search with conditions on normal indexed columns and fulltext indexed columns together???
If we partition the table with fulltext index again, the search related PHP code logic( paging ) may be complicated and we may need several SQL queries for single user search and need to combine the result set.
Moreoever when user can select multiple fulltext related condition such as (music which I like, book which I like and so on), the required concurrent index type should be increased.
For example, let us suppose that user selects age range, interested sex, current location and types beatles for music which I like column, soccer for sports which I like, mysql for topic which I have interest.
In this case, we need several subqueries (one using normal index and the others using separate fulltext indexes). The number of subqueries may be 9 at worst case…
This approach of joining several subqueries with relevant index can be feasible when the data set is so large???
I heard on Sphinx in this site. Sphinx or other fulltext search engine can be helpful for our complicated user search scenario?
Would someone let me know good directions or solutions for our user search?
Thanks in advance.