-
Notifications
You must be signed in to change notification settings - Fork 21
Arithmetic operators on Nullable #111
Comments
Thanks for looking into this. We should fix it in Julia Base as part of JuliaLang/julia#16080. I guess the |
This kind of change needs to be made with some caution. As is, the code is already unsafe since we're implicitly assuming that it's always safe to operate on the raw bits associated with a value: you can create nonsensical immutables this way and it's not impossible to imagine scenarios in which our current strategy introduces bugs. We will almost surely introduce segfaults if we remove this restriction without substantial code changes -- but those code changes could prohibit SIMD optimizations, so care will be required. The current approach is a way of working around the lack of return types. Making these things both fast and safe requires that we know what type of nullable we should return even when the arguments are nulls. |
You mean that adding anything to the
Isn't this fixed with JuliaLang/julia#16432? Anyway, I don't understand why this is needed: can't we just return |
I mean that this line ( NullableArrays.jl/src/operators.jl Line 38 in d0b94ae
But the introduction of that branch will often prevent SIMD operations. (It definitely prevented SIMD when we wrote that code, although it's conceivable that some compiler magic would resolve that.)
I'm hopeful that we'll be able to do a better job now that 16432 has landed. I haven't had time to try, but this is one of the things I hope David and I will look into this summer.
That's a very, very good point. I was trying to push for maximum generality, but for the core arithmetic operations |
I tried to benchmark the various solutions, but hit JuliaLang/julia#16709 on 0.5. On 0.4.5, the safest/most general version branching on
See https://gist.github.com/nalimilan/94a3dc790bc592e8b2b561d9dcca9a9b |
@johnmyleswhite @davidagold In which cases is SIMD enabled with the current definitions? In my tests it isn't, which makes it hard to check for possible regressions. Anyway, a reasonable strategy seems to be How does that sound? I'd like to get as many functions as possible into Julia 0.5 so that we're not blocked for another cycle. |
I don't have any immediate objections to the above, but I also haven't done much in-depth thinking about this issue. I'm landing at MIT next Monday, so that should be the start of when I can dedicate my full time and energy to JuliaStats projects. |
Great to hear that, I had no idea. What are your plans? Would you be willing to try moving the few essential |
OK, JuliaLang/julia#16709 is fixed now, so the most general strategy with a fast path for |
@nalimilan Plans are still evolving, but they seem to be settling on developing data manipulation facilities for DataFrames. Which features in particular are you thinking of? And when is 0.5 expected to land? |
I don't know when 0.5 is supposed to be released, but I thought it was going to happen around JuliaCon. As regards improving the data management facilities, I think getting arithmetic operators for Nullable into Base is a priority. Support for automatic lifting via a macro would also be very useful: in particular, it could be used inside DataFramesMeta macros to make it easy to work with data. |
I'd like to work on the automatic lifting. Still not sure where it should live. But that's a discussion for elsewhere. For now, do you envision moving everything in |
Probably in a package for experimentation (possibly in DataFramesMeta), but IMHO it could live in Base once we've settled on a design. That sounds like essential functionality.
Yes, that's my plan, modulo what I noted at #116 (comment). What do you think of it? |
I'm still working to grok all the relevant issues, including SIMD-ness. In general, I am assessing proposed changes in terms of the amount of progress they would allow and the difficulty in reverting them later. These operations should be in Base, and should be defined for a (limited) set of non bits types. If we decide that this particular implementation is suboptimal, then it can probably be patched in a sub-release of 0.5, as long as doing so doesn't change anything user-facing --- at least that's my understanding. Given these considerations, I'm in favor of this plan. Pinging @johnmyleswhite for his thoughts on such reasoning. |
Yes, I think the semantics are well defined now, so we can always improve the implementation later. |
Fixed by #119. |
The line:
https://github.com/JuliaStats/NullableArrays.jl/blob/master/src/operators.jl#L37
has the following consequences:
a) types that are not bits throw error when operated on:
This probably should be fixed.
b) operation on a missing of type that is not bits fails
I am not sure if this should be fixed. Implementation
convert
forNullable
implies that missing value ofNullable{T}
can be always converted toNullable{S}
independent on whatT
andS
are but I am not sure if this conversion should be automatically performed in operations on nullables.The text was updated successfully, but these errors were encountered: