Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching whole words in the middle of a longer string #13

Open
ibrierley opened this issue Apr 13, 2021 · 3 comments
Open

Matching whole words in the middle of a longer string #13

ibrierley opened this issue Apr 13, 2021 · 3 comments

Comments

@ibrierley
Copy link

Hi, I have seen the issue at #4

But ExactSearch just seems to try and match a single word with a single word. It doesn't match a whole string "only" in the middle with a word boundary, like the original problem reported.

I.e with ExactSearch "abc" it will NOT match at all "abcde abc zabc", but will match if the string is "abc" (so it's basically acting like a Map)
But with MultiPatternSearch abc will match 3 times.

It would be good to have an option where it can match inside an arbitrary long string, but only at word boundaries either side (eg if there is whitespace or end of line next to the match). I'd be happy to add a specific boundary character between words if it helps.

Hope that makes sense!

@ibrierley
Copy link
Author

Just to give an idea of a hacky test that gets me closer, in the middle of MultiPatternSearch if I do...

// func (m *Machine) MultiPatternSearch(content []rune, returnImmediately bool) [](*Term) {
// ...start of func
// .. for _, word := range val {
// ...then add this inside the loop
// if previous word char is a whitespace and we are at the end of the string, and the char after the word is whitespace
        if ( content[ pos - len(word) ] < 34 ) && ( (pos+1 < contentLength && content[pos+1] < 34) || pos+1 == contentLength )  {

            term := new(Term)
            term.Pos = pos - len(word) + 1
            term.Word = word
            terms = append(terms, term)
            if returnImmediately {
                return terms
            }
        }

It naturally won't work for other none simple ascii languages, and would need a switch in the func to decide whether to use it not, but it's the sort of thing I was meaning maybe.

@petar-dambovaliev
Copy link

@ibrierley https://github.com/petar-dambovaliev/aho-corasick/tree/master
I implemented it, if this is what you were referring to.

@ibrierley
Copy link
Author

Thanks for this! I've just added a comment/issue on your repos with a problem I'm having getting it going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants