Creating a custom stoplist from actual data

I have some huge fulltext indexes created using ft_min_wordlen=2 and no stoplist. The data is not english language so the default stoplist cannot be used.

I have this idea if you could ‘look’ at the current index (using a custom tool, a mysql patch or whatever) you should be able to determine what words would be good candidates in a custom stoplist. Would that make sense? Is it at all theoretically possible? If it is, and if someone would create that tool, I guess it would be of benefit for many users with non-english (but space delimited) data.

As a parenthesis, I did try having a perl script extract all words from the table (not the index) and count their frequencies. It works fine, but is very slow and dull. If my idea is doable, I picture it would be blazingly fast and usable on huge existing tables.

Ideas? Comments? Thanks.

Maybe you are looking for myisam_ftdump.

Doh, it was there all the time? Many thanks, my post was over a year ago but I still needed it!