-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading missings is twice as slow as reading values #129
Comments
julia> typeof(Feather.materialize("test2.feather").x)
Array{Float32,1} I'd forgotten that Feather forgets about missings when there aren't any. That explains it... Reading a 50% mix is 30% slower than reading all-missing: julia> df5 = DataFrame(x=Union{Float32, Missing}[rand()<0.5 ? missing : 1.1 for _ in 1:N]);
julia> Feather.write("test5.feather", df5);
julia> @btime Feather.materialize("test5.feather");
1.434 s (436 allocations: 953.69 MiB) However that could be explained by poor branch prediction, I suppose? Maybe there isn't anything concrete to be done, I know your code is already highly optimized. |
There is a lot more overhead for reading and writing arrays with missings and arrays without. This is just because of how the arrow format works. I wouldn't say anything here is "highly optimized", but I have done lots of basic performance sanity checks (for reading at least). Reading arrays without missings is extremely simple, and is therefore pretty much guaranteed to be maximally efficient. Reading arrays with missings is a lot more complicated, so it's much harder for me to state with any confidence whether it's close to saturating the theoretical upper limit on performance. I'm not entirely sure why reading all missings is faster, but it may have something to do with the Julia type system (since the Of course, I'd always be happy to improve performance if possible, specific suggestions and PR's are of course welcome. That said, reading arrays with missings will never be as fast as reading those without, so I don't actually see an issue here. Feel free to re-open this if there is a specific performance problem here. |
The text was updated successfully, but these errors were encountered: