Probably a very simple question to answer for real experts, but i’m doubting I will answer it myself without having to experience this in real life (maybe when it is to late). This will be used for on website that is queryed very very frequently.
The problem:
Having natural language lines in a database (in multiple language columns).
(example: ‘This is a sentence’)
Want to be able to search for this in the fastest way possible.
(example: search for all rows containing ‘this’ and ‘sentence’.)
1st conclusing:
-normal index would not work as you have to search with %…% in this as it are lines not words.
-Normal fulltext index is not sufficient as pairs have to be made for every column combination possible. And as i have some 32 languages… )
My solution:
Create a new table that excists of this:
unique column 1 column 2
word id1,id2,idX id9,id8,idX
and i query this (could even do this using soundex to help visitor after this step) and then do Select * from [table] where id in (id1,id2,idX).
On first sight you might think it would be faster (if i combine this into 1 query), but already some premature testing on my machine shows that something like
select * from [table] where column1 like ‘% someword %’ OR column1 like ‘% someotherword %’ OR column2 like ‘% someword %’ OR column1 like ‘% someotherword %’
is faster than only the Select * from [table] where id in (id1,id2,idX)
Not even talking about first selecting better results.
My question:
- Should this in the end prove faster? Or can i better stick with the %…% that does not use the index.
(is there maybe a better way to do this)