-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new CentralEuropeanStreetNameClassifier #88
Conversation
You are using section classifier and forcing length to 2, this definitely reduce side effects 👍. But we should be careful with words and phrases. In your PR the Alpha member should not be classified with a public classification, which is good IMO. But the section is composed by $ node bin/cli.js Paris 75000, France master:
central_european_streets:
|
Yeah agreed, it should ensure that the tokens have no public classifications at all. |
It's a really tricky case to handle without a gazetteer and/or a geocoder. There is a street I cycle past quite often called |
Maybe we also add a check that the |
Nice, your PR seems to work for Esplanade too ! (Which is a street prefix in French) $ node bin/cli.js Esplanade 17, 13187 Berlin, Germany
|
39e3d29
to
ae0aa7b
Compare
adds a new
CentralEuropeanStreetNameClassifier
which is able to handle the cases mentioned in #83it's still fairly basic, but relatively safe.
in the future we may consider expanding this to cover:
1 xxx
instead ofxxx 1
(although this might be dangerous?)closes: #83