Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient inlining of PartialEq::eq #59375

Closed
tdaede opened this issue Mar 23, 2019 · 10 comments
Closed

Insufficient inlining of PartialEq::eq #59375

tdaede opened this issue Mar 23, 2019 · 10 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@tdaede
Copy link
Contributor

tdaede commented Mar 23, 2019

In rav1e, an enum that implements PartialEq fails to get its eq() method inlined at any opt-level lower than 3, despite being a one-instruction test. This destroys performance on anything other than opt-level 3.

I made a godbolt test case, however it only exhibits the behavior between 1 and 2, not 2 and 3. Not sure what makes rav1e different, but regardless even for opt-level 1 it probably should be inlined. Possibly the standard library should have #[inline(always)] for this derive implementation?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0ca77d0df01dc401fec111d64aa9eb0f

@jonas-schievink
Copy link
Contributor

#[inline(always)] is respected even in unoptimized debug builds, you really don't want that on every derived PartialEq impl since it would slow down compile times for all programs that #[derive(PartialEq)].

Unconditionally putting #[inline] on all derived PartialEq impls also seems very suboptimal, since it's only rarely needed, so having a way of configuring it would be helpful (AFAIK no builtin derive currently has any configuration attributes).

Is there a reason you can't use opt-level 0 for development and opt-level 3 (aka Cargo's release profile) for benchmarks and tests? (I'm assuming you want this because your tests take a long time?)

@jonas-schievink jonas-schievink added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 23, 2019
@jonas-schievink
Copy link
Contributor

Okay nevermind me, apparently it already puts #[inline] on the methods:

let inline = cx.meta_word(span, Symbol::intern("inline"));

@tdaede
Copy link
Contributor Author

tdaede commented Mar 23, 2019

Yup, likewise the libcore implementations are also #[inline]. The implementation of partialeq is always just byte compare (usually 1 instruction), so I can't think of any case you would not want it inlined.

I found this bug by converting an integer field to an enum in the project, and suddenly seeing CI take over 2x the time (as it runs in debug mode). It actually makes it impractical to use an enum unless we use opt-level 3 for debug (and that inlines so aggressively otherwise that it's not desired for debugging).

@tdaede
Copy link
Contributor Author

tdaede commented Mar 27, 2019

I tried changing the implementation to the following:

            let always = cx.meta_list_item_word(span, Symbol::intern("always"));
            let inline = cx.meta_list(span, Symbol::intern("inline"), vec![always]);

Unfortunately the resulting compiler really struggles to compile the syn crate - I'm not sure if it's just really slow or it's actually hung, I'll let it run overnight.

@tdaede
Copy link
Contributor Author

tdaede commented Mar 27, 2019

I think this actually might be caused by neq, not eq. I seem to get faster runtimes if I also derive Eq - the problem is that without Eq, an implementation of neq is made that calls eq, which requires double inlining and the compiler fails to do this.

Mixed artifacts from two different compilers, deriving Eq doesn't solve the problem.

@pnkfelix
Copy link
Member

It might be interesting to try to change the derive(PartialEq) to generate two methods: the first, marked #[inline(always)], just inspects the two discriminants (or for integers, the values themselves), and then dispatches to the second method. The second method, marked #[inline] (as today) would hold the remainder of the comparison...

@dotdash
Copy link
Contributor

dotdash commented Apr 24, 2019

I played around with this and have a WIP branch (dotdash@a5dc7e3) that teaches the deriving code about trivial cases, so the generated code turns from:

    fn eq(&self, other: &RefType) -> bool {
        {
            let __self_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*self) } as
                    isize;
            let __arg_1_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*other) } as
                    isize;
            if true && __self_vi == __arg_1_vi {
                match (&*self, &*other) { _ => true, }
            } else { false }
        }
    }

into

    fn eq(&self, other: &RefType) -> bool {
        {
            let __self_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*self) } as
                    isize;
            let __arg_1_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*other) } as
                    isize;
            if true && __self_vi == __arg_1_vi { true } else { false }
        }
    }

Which saves 32 bytes of stack in the eq function in this case (it depends on the repr type for the enum). Additionally, the derive for PartialEq emits #[inline(always)] when none of the enum variants have fields, which seems almost reasonable with the improved codegen.

But there's one problem left: the discriminant intrinsic. The codegen for intrinsics still requires a stack slot for the return value, even for immediate values. Thus both discriminants get spilled onto the stack for no good reason. If we could get that fixed we'd have good enough codegen to always inline these simple cases, even without optimizations.

Unfortunately the codegen code has changed quite a bit since I last touched it, so I didn't quite get that part done, and I'm not sure how long it might take.

@dotdash
Copy link
Contributor

dotdash commented Apr 24, 2019

@tdaede could you try and see whether this helps with your code in rav1e?

@tdaede
Copy link
Contributor Author

tdaede commented Apr 25, 2019

I've been trying to get it to reproduce, and now despite using the same compiler version and code, I see it suddenly decide to inline now :( I wonder if it's dependent on what codegen unit boundaries rustc decides to use at a given time? My smaller testcases in the bug still reproduce.

FWIW in rav1e I was using the following to try to reproduce it:
$ perf record cargo test --features=decode_test test_encode_decode::bitrate_aom
And then watching for eq to show up in the profile.

@saethlin
Copy link
Member

https://godbolt.org/z/rKb5vYqWb

It looks to me like this was fixed in 1.59, the PartialEq implementation is now inlined at -Copt-level=1.

cc @alice-i-cecile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants