-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This analyzer is supposed to incorporate bigrams (pairs of adjacent words) into the search query. This is useful because part of the meaning of a sentence comes from the word order, for example: "book a driving test for someone else" vs "driving someone else for a test book". However, this code never worked as intended because it only analyzed queries. So a query was broken down into single words and bigrams, but it compared those tokens to analyzed text that didn't contain any bigrams at all. This means the bigram part of the query is only functioning as a single word match. This is very confusing when trying to understand what Rummager is doing. I've changed it to use the normal query analyzer. This will change results slightly, because the shingles analyzer didn't include synonyms, but the new analyzer does. Bigram matching was implemented properly as part of the 'new weighting' code a couple of years ago, but it never went live. This is something that could be revisited in future.
- Loading branch information
Showing
4 changed files
with
3 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters