Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Street number prefixes #71

Closed
NickStallman opened this issue Oct 7, 2019 · 7 comments · Fixed by #87
Closed

Street number prefixes #71

NickStallman opened this issue Oct 7, 2019 · 7 comments · Fixed by #87

Comments

@NickStallman
Copy link

I was doing some mucking around with street number prefixes.

For example:
Unit 12/345 Main St
Apt 12/345 Main St
Lot 12/345 Main St
U12/345 Main St

In most cases the prefix just gets ignored and just classified as alpha, start_token.
This doesn't work for "lot" which gets detected as a place, and "U12" which breaks parsing entirely - alphanumeric, start_token for the entire string 'U12/345' thus losing both the unit number and the street number from the phrases step.

Is it worthwhile adding a classifier for these unit number prefixes so they can be detected explicitly?
"Unit" and "Lot" are very common in my datasets, but there are a few other alternatives which pop up from time to time.

@missinglink
Copy link
Member

I think we're heading towards exposing a unit classification, I had a play with it in #69 but it needs more work.

@Joxit
Copy link
Member

Joxit commented Oct 9, 2019

Is this the wanted result ?

[ { unit: 'Unit 12' }, { housenumber: '345' }, { street: 'Main St' } ]

@NickStallman
Copy link
Author

@Joxit it's probably fine to just have unitnumber:12 and eliminate the word for the purposes of parsing an address.

None of the source data for Australia has a word prefix, but it's not uncommon to have it in an address that someone writes down.

"Lots" are slightly different as this indicates a new development where no real address exists yet.
But I don't think this changes anything from a parsing point of view, typically you still just want the number to try and match but in Pelias these usually fall back to the street.

@missinglink
Copy link
Member

I always get confused about how to interpret these unit delimiters, although I think this excerpt from https://en.wikipedia.org/wiki/Address#Australia explains it pretty well (for slashes):

Apartment, flat and unit numbers, if necessary, are shown immediately prior to the street number (which might
be a range), and, as noted above, are separated from the street number by a forward slash. These conventions 
can cause confusion. 

To clarify, 3/17 Adam Street would mean Apartment 3 (before the slash) at 17 Adam Street (in the case of a 
residential address) or Unit 3 at 17 Adam St (in the case of a business park).

On the other hand, 3–17 Adam Street would specify a large building (or cluster of related buildings) occupying 
the lots spanning street numbers 3 to 17 on one side of Adam St (without specifying any particular place within 
the building(s)).

These forms can be combined, so 3/5–9 Eve Street signifies Apartment 3 (before the slash) in a building which 
spans street numbers 5 to 9 on one side of Eve Street.

When it comes to hyphens it's a lot more inconsistent internationally, so for something like 3-17 we might need to have the country in order to correctly parse it, because it could mean:

  • a range of street numbers 3 to 17
  • apartment 3, house number 17
  • house number 3, apartment 17

There are also weird exceptions like:

In New York City, Hawaii, and Southern California, some addresses have a hyphen in the street number, which cannot be removed without loss of information; for example "112–10 BRONX RD".

@NickStallman
Copy link
Author

Haha oh boy that stuff sounds fun. Yep that Wikipedia article agrees with what I see.
The slash is almost exclusively used to delimit the "unit number" portion of the address. I've never seen dash used that way.
Occasionally space can also be used "Apartment 123 456 Main St" vs "123/456 Main St" but that's mainly just the occasional lazy person instead of convention.

@Joxit
Copy link
Member

Joxit commented Apr 17, 2020

I there I started a PR #87, if you want to test it @NickStallman :)

The PR should fix your issue

@Joxit Joxit closed this as completed in #87 May 4, 2020
@NickStallman
Copy link
Author

Awesome thanks @Joxit
I'm due to do a new build soon so I'll give it a go then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants