Not the answer you need?
Register and ask your own question!
Many Forum changes were implemented on Tue 22 Sep. Read about new Ranks, Scoring, and Reactions.
Email [email protected] for any comments or concerns.

Creating a custom stoplist from actual data

myrazmyraz EntrantInactive User Role Beginner
I have some huge fulltext indexes created using ft_min_wordlen=2 and no stoplist. The data is not english language so the default stoplist cannot be used.

I have this idea if you could 'look' at the current index (using a custom tool, a mysql patch or whatever) you should be able to determine what words would be good candidates in a custom stoplist. Would that make sense? Is it at all theoretically possible? If it is, and if someone would create that tool, I guess it would be of benefit for many users with non-english (but space delimited) data.

As a parenthesis, I did try having a perl script extract all words from the table (not the index) and count their frequencies. It works fine, but is very slow and dull. If my idea is doable, I picture it would be blazingly fast and usable on huge existing tables.

Ideas? Comments? Thanks.

Comments

  • xaprbxaprb Mentor Inactive User Role Beginner
    Maybe you are looking for myisam_ftdump.
  • myrazmyraz Entrant Inactive User Role Beginner
    Doh, it was there all the time? Many thanks, my post was over a year ago but I still needed it!
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.