You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't really think this is a "bug", more like an extreme edge case.
While using the library I had to parse millions of name and encountered a user input:
"<first_name> van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der <last_name>"
Again, Im not expecting you to fix this since this is obviously a user input error (which I bypassed by setting a maximum size to the string), but I thought you might be interested to know about this edge case.
The text was updated successfully, but these errors were encountered:
Thanks for the bug report. I wondered if this would ever be an issue when I wrote it that way.
When the parser encounters a new combination of titles joined with a conjunction, it saves the complete string as a new title in the module's shared config (by default) and takes another pass. So each pass would result in a title with one additional conjunction or title added to the end. That somewhat explains the exponential nature, but it might also depend on how you're using the parser. I wonder if you would have the same problem with something like this:
This should ensure that the module level config is shared across all the instances. I guess I'm not clear why that list would grow so large as to throw a memory error. In my understanding, it seems like it should just be storing less than 50 different versions of that very long title.
Anyway, It would be nice if the library didn't throw ambiguous memory errors, so maybe we can give it a better exception. Here is where the new title with conjunctions are saved to the module level config:
We could test for some maximum around there, and have a default that can be overridden with the config object. I'm not sure exactly where it would need to go though. I wonder if the problem is in that group_contiguous_integers(conj_index) call?
If you are able to poke around or have any ideas, let me know. I haven't fired up my dev environment yet to try that name string, but when I do I'll try to find someplace to put in a maximum and then maybe throw some more informative exception? or maybe a warning?
I don't really think this is a "bug", more like an extreme edge case.
While using the library I had to parse millions of name and encountered a user input:
"<first_name> van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der van der <last_name>"
This name quickly caused a MemoryError in a PC with 60+GB of RAM, more specifically
This list : https://github.com/derek73/python-nameparser/blob/master/nameparser/parser.py#L799 is growing exponentially in size very fast.
Again, Im not expecting you to fix this since this is obviously a user input error (which I bypassed by setting a maximum size to the string), but I thought you might be interested to know about this edge case.
The text was updated successfully, but these errors were encountered: