Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append unclassified tokens to the street #28

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented May 25, 2019

I created a solver that can fill the blanks (only for StreetPrefixClassification).
We have some very long streets names, and this is not simple to safely match all street names.
I thought that the best way to do this is to append unclassified tokens to the street (when the token is at the end of the street).
Maybe it can also be used for venues.

Paris is always used as a locality, so I removed it from regions.
Add cité in street_types.

@Joxit Joxit force-pushed the joxit/fra/street_name branch from 68ca511 to 47b5a61 Compare May 25, 2019 06:02
@Joxit Joxit changed the title Add some cases for street_name Append unclassified tokens to the street May 25, 2019
@Joxit Joxit force-pushed the joxit/fra/street_name branch from 47b5a61 to 41535f0 Compare June 5, 2019 13:21
@missinglink
Copy link
Member

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

@missinglink
Copy link
Member

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

@Joxit
Copy link
Member Author

Joxit commented Jun 5, 2019

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

Hum, you're totally right, the last token shouldn't be appended.
Rue Saint-Germains Ermon (the real locality is Ermont) should not returns Boulevard Saint-Germains Ermon as streets... It's more safe if we already have something like Rue du 8 Mai Ermont (Mai isn't in the solution).

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

I wanted to have your opinion on this PR. There are also something that bothers me in what I did....
I will try what you said. 😄

@Joxit Joxit changed the title Append unclassified tokens to the street [DO NOT MERGE] Append unclassified tokens to the street Jun 11, 2019
…the street

This will be used only when StreetPrefixClassification is used.

Remove Paris from regions and add cité in street_types.
Paris is always used as a locality
@Joxit Joxit force-pushed the joxit/fra/street_name branch from 41535f0 to f7c155f Compare July 8, 2019 13:34
Now I replace the solution with the correct phrase
@Joxit Joxit changed the title [DO NOT MERGE] Append unclassified tokens to the street Append unclassified tokens to the street Jul 10, 2019
@Joxit
Copy link
Member Author

Joxit commented Jul 15, 2019

I've updated this PR.

  • I update the solution with an existing span
  • I don't fill the solution with a end-token span
  • This works only with street prefix classification

@Joxit Joxit requested a review from missinglink July 15, 2019 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants