Setup Sphinx

I have a problem domain where there are lots of messages to be indexed, and I want to be able to limit them by a set of groupids.

Essentially, each group is a subgroup of another group. Something like this:

(((WebsiteID,AreaOfWebSiteID),SubAreaOfWebSiteID),ForumID)

And an alternate:

(WebsiteID,UserID)

All in addition to the keywords.

I’m fine with just using Sphinx for now, but want to use its MySQL integration at some point as well.

Just not sure how to get started…

Just define several groups.

WebSiteID
AreaofWebSiteID

etc

and then you can apply filters as you need to.

Thanks, I wasn’t sure if there was a composite type index for groups.

I guess my other issue is that I don’t have a unique document ID. My unique id is a pair (ForumID,MessageID). Do I need to create a translation table to map this tuple to a single unique integer?

Thanks again!

One other thing… since adding new documents is slow, I was thinking of creating files sort of like this:

Main Index
Delta Index for current month
Delta Index for current day

Then at the end of each day, merge things into the month, and at the end of each month, merge things into the main index. I want the searching to seem as up-to-date or “live” as possible.

Any estimate on how many documents are in an index before adding one seems slow? The idea of splitting by day or month is unscientific. It really should be based on document count (for example, what if I ended up needing to split off on a per hour basis?). I know only testing will tell me, but if there are some basic guidelines, it would be very helpful.

Thanks!

We usually use two indexes global one and incremental index, which is usually per day. There is number of these index pairs used for various reasons.