-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add int_able() and float_able()? #1775
Comments
It seems like the Julia parser should already have something like this? Or is it too general to be fast enough for a special case like this? |
Check out |
As mentioned on julia-users,
That name is definitely a little obscure. Why not |
In real use you will want to factor out the |
That makes it only 3x slower:
|
Oh, it is important to remember that the function also gives you the converted value if it is valid, so it would probably end up being faster. |
Ok. I'll have to see if we can exploit that to combine our type inference and conversion steps. Right now the data conversion step is killing us performance-wise. Is there a similar |
Modifying int_able to also return the value makes it about 2x slower. Note that this is now just an inferior implementation of the function parse_int() that is already in base. (although this function is faster by a factor of 3, probably because it doesn't handle arbitrary bases). But Jeff might be correct that combining the type inference and conversion steps is a win (the difference in times is expected at least). But you might still end up with many cases where the first set of rows was a bad predictor, so then you need to bookkeep which rows need to be reconverted and reallocate storage for the new type.
|
I'll try to see if we can take advantage of the fact that validation and conversion are completely intertwined. Our difficulty is that we need to make a first pass to remove any missing value indicator like |
subsumed by #3631 |
In trying to resolve some performance issues for DataFrames I/O, @vtjnash very kindly wrote code to determine whether a string could be parsed as an integer or float. I would suggest that functions like these go into Base because they've come up before in other context, such as the JSON parser. Having fast functions for doing this would be very helpful, especially given that Jameson showed that his functions are orders of magnitude faster than using regular expressions. His code is below:
The text was updated successfully, but these errors were encountered: