-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Optionally set FZ/DAZ on SSE(2) processors #789
Conversation
|
I was thinking |
Also, I assume you don't mean "thunk" in the technical sense of an unevaluated blob o'computation? Otherwise this would have to be a macro. |
A julia program could accept and use such a command line switch. But I'd also be fine with a global switch. No reason not to have tons of switches like every other compiler :) And there can also be switches for error-on-overflow, removing bounds checks, etc. |
No switch yet, but just to give some idea of the performance difference (here, on Sandy Bridge): julia> ccall(:jl_zero_denormals, Bool, (Bool,), false)
true
julia> @time begin
a = 3e-308
for i=1:100000000
a = 0.9999999a
end
end
elapsed time: 8.600224018096924 seconds
julia> ccall(:jl_zero_denormals, Bool, (Bool,), true)
true
julia> @time begin
a = 3e-308
for i=1:100000000
a = 0.9999999a
end
end
elapsed time: 3.156193971633911 seconds EDIT: Better benchmark |
We need to use a C function which can use compiler intrinsics to set or clear the flush-to-zero and (if SSE2 is available) denormals-are-zero flags in the SSE MSCSR register. An appropriate interface from Julia will follow.
TODO: support _M_IX86_FP macro for VC++; possibly abstract SSE support info into its own function? |
It seems harmless to just have this in the build so that it can be called if someone wants to. Should we just merge this into master and continue to work on it from there? Or does having it here provide more impetus to resolve the matter? |
It's not critical to anything anyone is doing right now. Would like someone to check the C style is consistent with what people expect; it's more You can treat this as part of my workflow management right now. I'll get it closed soon enough. |
The indentation looks a little funky, but other than that, it looks fine to me. This seems like just the kind of thing that needs a lot of #ifdefs. |
RFC: Optionally set FZ/DAZ on SSE(2) processors
that seems to be how github displays tabs in files that use mixed space/tab indenting. |
Oh, there really shouldn't be tabs...I really wasn't done with that yet, though. |
Well, that's ok. You can carry on making changes to this on master. As long as it compiles and no one is using it yet, you're safe :-) |
For what it's worth, I think this is very important functionality that we ought to have, even if it's not particularly easy to use. This is advanced stuff for people who really, really need performance. They ought to be able to have that. |
@pao I have but one question: Why did you name this branch |
@StefanKarpinski Fair enough. But as of now this issue is the only documentation. |
I really like that. Let's use that terminology going forward. |
A potentially serious performance bottleneck on Intel processors is getting stuck in microcode for handling subnormals, which can kill code such as IIR filters and feedback controllers. A quick source grep seems to indicate we're not doing this by default, which is good for absolute correctness, but we should have a way to set either the flush-to-zero or denormals-are-zero flags as appropriate for the x86 variant. The intrinsics are defined in
xmmintrin.h
.Any ideas on the Right Way to make this available?