I have some huge fulltext indexes created using ft_min_wordlen=2 and no stoplist. The data is not english language so the default stoplist cannot be used.
I have this idea if you could ‘look’ at the current index (using a custom tool, a mysql patch or whatever) you should be able to determine what words would be good candidates in a custom stoplist. Would that make sense? Is it at all theoretically possible? If it is, and if someone would create that tool, I guess it would be of benefit for many users with non-english (but space delimited) data.
As a parenthesis, I did try having a perl script extract all words from the table (not the index) and count their frequencies. It works fine, but is very slow and dull. If my idea is doable, I picture it would be blazingly fast and usable on huge existing tables.
Ideas? Comments? Thanks.