Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Optionally set FZ/DAZ on SSE(2) processors #789

Merged
merged 1 commit into from
May 3, 2012

Conversation

pao
Copy link
Member

@pao pao commented May 3, 2012

A potentially serious performance bottleneck on Intel processors is getting stuck in microcode for handling subnormals, which can kill code such as IIR filters and feedback controllers. A quick source grep seems to indicate we're not doing this by default, which is good for absolute correctness, but we should have a way to set either the flush-to-zero or denormals-are-zero flags as appropriate for the x86 variant. The intrinsics are defined in xmmintrin.h.

Any ideas on the Right Way to make this available?

@JeffBezanson
Copy link
Member

with_flush_to_zero(thunk)? :)

@pao
Copy link
Member Author

pao commented May 2, 2012

I was thinking @makemycodefasterbutslightlylessaccurate begin...end. More seriously, I'm not sold on the functional (or macro, which is the same idea) approach to this, because you'll end up wrapping your entry point with it. Plus the magic only works on some processors--in fact, on PowerPC (for instance) the impact is minimal. When we do this at work we do it with a command-line switch to the program. I can be convinced, though, which is why I'm asking before I try to do anything about it.

@pao
Copy link
Member Author

pao commented May 2, 2012

Also, I assume you don't mean "thunk" in the technical sense of an unevaluated blob o'computation? Otherwise this would have to be a macro.

@JeffBezanson
Copy link
Member

A julia program could accept and use such a command line switch. But I'd also be fine with a global switch. No reason not to have tons of switches like every other compiler :) And there can also be switches for error-on-overflow, removing bounds checks, etc.

@ghost ghost assigned pao May 3, 2012
@pao
Copy link
Member Author

pao commented May 3, 2012

No switch yet, but just to give some idea of the performance difference (here, on Sandy Bridge):

julia> ccall(:jl_zero_denormals, Bool, (Bool,), false)
true

julia> @time begin                                    
       a = 3e-308
       for i=1:100000000
       a = 0.9999999a
       end
       end
elapsed time: 8.600224018096924 seconds

julia> ccall(:jl_zero_denormals, Bool, (Bool,), true)
true

julia> @time begin 
       a = 3e-308
       for i=1:100000000
       a = 0.9999999a
       end
       end
elapsed time: 3.156193971633911 seconds

EDIT: Better benchmark

We need to use a C function which can use compiler intrinsics to set or clear
the flush-to-zero and (if SSE2 is available) denormals-are-zero flags in the SSE
MSCSR register. An appropriate interface from Julia will follow.
@pao
Copy link
Member Author

pao commented May 3, 2012

TODO: support _M_IX86_FP macro for VC++; possibly abstract SSE support info into its own function?

@StefanKarpinski
Copy link
Member

It seems harmless to just have this in the build so that it can be called if someone wants to. Should we just merge this into master and continue to work on it from there? Or does having it here provide more impetus to resolve the matter?

@pao
Copy link
Member Author

pao commented May 3, 2012

It's not critical to anything anyone is doing right now. Would like someone to check the C style is consistent with what people expect; it's more #ifdef-heavy than anything else I've seen in the sources. And still need to account for the VC++ versions of the macros for good measure.

You can treat this as part of my workflow management right now. I'll get it closed soon enough.

@StefanKarpinski
Copy link
Member

The indentation looks a little funky, but other than that, it looks fine to me. This seems like just the kind of thing that needs a lot of #ifdefs.

JeffBezanson added a commit that referenced this pull request May 3, 2012
RFC: Optionally set FZ/DAZ on SSE(2) processors
@JeffBezanson JeffBezanson merged commit 887d701 into JuliaLang:master May 3, 2012
@JeffBezanson
Copy link
Member

that seems to be how github displays tabs in files that use mixed space/tab indenting.

@pao
Copy link
Member Author

pao commented May 4, 2012

Oh, there really shouldn't be tabs...I really wasn't done with that yet, though.

@StefanKarpinski
Copy link
Member

Well, that's ok. You can carry on making changes to this on master. As long as it compiles and no one is using it yet, you're safe :-)

@StefanKarpinski
Copy link
Member

For what it's worth, I think this is very important functionality that we ought to have, even if it's not particularly easy to use. This is advanced stuff for people who really, really need performance. They ought to be able to have that.

@Keno
Copy link
Member

Keno commented May 4, 2012

@pao I have but one question: Why did you name this branch zebras?

@pao
Copy link
Member Author

pao commented May 4, 2012

@StefanKarpinski Fair enough. But as of now this issue is the only documentation.
@loladiro At work, setting these flags (which we do to get some of our sims running realtime) has become known as "flushing the zebras" (in lieu of "flush to zero"). I'm not exactly sure who came up with that, but it's fun to say, and it stuck.

@StefanKarpinski
Copy link
Member

I really like that. Let's use that terminology going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants