-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARN invalid UTF-8 at transliterate (transliterate.c:790) errno: Resource temporarily unavailable #101
Comments
Hi John - from my hazy recollection of Erlang, strings are represented as linked lists and then there's a more efficient type called a binary which is a pointer to a character array and its size similar to strings in C++, Python, etc. and that's the type you're using (bravo). I'll assume that the original string is already UTF-8 encoded (if not, that's what libpostal expects so should check/ensure its encoding on the way in). The problem, I would guess, is that the Erlang string is not NUL-terminated ( So you'll want to create a NUL-terminated C string from the Erlang binary before passing it to libpostal. Haven't tested this, but something like changing https://github.com/johnhamelink/postie/blob/master/src/postie.c#L78 to |
@thatdatabaseguy Thank you for such a clear and definitive explanation! Adding in that line did the trick, and because of no extra random data making its way into the libpostal call, the responses have become much less erratic as well, which also makes my unit tests work better. I will keep working on it, and then perhaps I can submit a PR to add postie to your list of unofficial libs? |
No problem, and yes, happy to accept pull requests! |
WARN invalid UTF-8 at transliterate openvenues/libpostal#101
…resses like "100 Main" with "100 S Main St." or units like "Apt 101" vs. "#101". Instead of expanding the phrase abbreviations, this version tries its best to delete all but the root words in a string for a specific component. It's probably not perfect, but does handle a number of edge cases related to pre/post directionals in English e.g. "E St" will have a root word of simply "E", "Avenue E" => "E", etc. Also handles a variety of cases where the phrase could be a thoroughfare type but is really a root word such as "Park Pl" or the famous "Avenue Rd". This can be used for near dupe hashing to catch possible dupes for later analysis. Note that it will normalize "St Marks Pl" and "St Marks Ave" to the same thing, which is sometimes warranted (if the user typed the wrong thoroughfare), but can also be reconciled at deduping time.
Hi there,
I'm working on an Elixir NIF for libpostal (mainly just to learn how to build NIFs to be honest). When I retrieve the binary string data from the Erlang VM and copy it into a signed char, I pass it through to libpostal to parse/expand the address input. It seems to work perfectly around 20% of the time, and all the other times I instead get the following response:
I would've assumed that the problem was in my code (it probably still is) but the
errno: Resource temporarily unavailable
as well as the fact that /sometimes/ it does work has thrown me off...Would you be able to provide any insight?
You can check the code out here: https://github.com/johnhamelink/postie
The text was updated successfully, but these errors were encountered: