-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Street number prefixes #71
Comments
I think we're heading towards exposing a |
Is this the wanted result ? [ { unit: 'Unit 12' }, { housenumber: '345' }, { street: 'Main St' } ] |
@Joxit it's probably fine to just have unitnumber:12 and eliminate the word for the purposes of parsing an address. None of the source data for Australia has a word prefix, but it's not uncommon to have it in an address that someone writes down. "Lots" are slightly different as this indicates a new development where no real address exists yet. |
I always get confused about how to interpret these unit delimiters, although I think this excerpt from https://en.wikipedia.org/wiki/Address#Australia explains it pretty well (for slashes):
When it comes to hyphens it's a lot more inconsistent internationally, so for something like
There are also weird exceptions like:
|
Haha oh boy that stuff sounds fun. Yep that Wikipedia article agrees with what I see. |
I there I started a PR #87, if you want to test it @NickStallman :) The PR should fix your issue |
Awesome thanks @Joxit |
I was doing some mucking around with street number prefixes.
For example:
Unit 12/345 Main St
Apt 12/345 Main St
Lot 12/345 Main St
U12/345 Main St
In most cases the prefix just gets ignored and just classified as alpha, start_token.
This doesn't work for "lot" which gets detected as a place, and "U12" which breaks parsing entirely - alphanumeric, start_token for the entire string 'U12/345' thus losing both the unit number and the street number from the phrases step.
Is it worthwhile adding a classifier for these unit number prefixes so they can be detected explicitly?
"Unit" and "Lot" are very common in my datasets, but there are a few other alternatives which pop up from time to time.
The text was updated successfully, but these errors were encountered: