Creating a custom stoplist from actual data

myraz · October 16, 2008, 3:57am

I have some huge fulltext indexes created using ft_min_wordlen=2 and no stoplist. The data is not english language so the default stoplist cannot be used.

I have this idea if you could ‘look’ at the current index (using a custom tool, a mysql patch or whatever) you should be able to determine what words would be good candidates in a custom stoplist. Would that make sense? Is it at all theoretically possible? If it is, and if someone would create that tool, I guess it would be of benefit for many users with non-english (but space delimited) data.

As a parenthesis, I did try having a perl script extract all words from the table (not the index) and count their frequencies. It works fine, but is very slow and dull. If my idea is doable, I picture it would be blazingly fast and usable on huge existing tables.

Ideas? Comments? Thanks.

xaprb · December 25, 2009, 7:03am

Maybe you are looking for myisam_ftdump.

myraz · December 25, 2009, 9:54am

Doh, it was there all the time? Many thanks, my post was over a year ago but I still needed it!

Topic		Replies	Views
Self-made index: faster or slower? Other MySQL® Questions	1	394	December 19, 2006
Search results not what they should be Other MySQL® Questions	1	356	December 19, 2007
DB Design for search Sphinx & Full-Text Search	4	636	December 4, 2007
Will changing ft_min_word_len work for new indexes without breaking old ones? MySQL & MariaDB	3	790	April 13, 2022
Mysql FULLTEXT optimization Sphinx & Full-Text Search	2	814	May 10, 2010

Creating a custom stoplist from actual data

Related topics