Insufficient inlining of PartialEq::eq #59375

tdaede · 2019-03-23T00:28:57Z

In rav1e, an enum that implements PartialEq fails to get its eq() method inlined at any opt-level lower than 3, despite being a one-instruction test. This destroys performance on anything other than opt-level 3.

I made a godbolt test case, however it only exhibits the behavior between 1 and 2, not 2 and 3. Not sure what makes rav1e different, but regardless even for opt-level 1 it probably should be inlined. Possibly the standard library should have #[inline(always)] for this derive implementation?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0ca77d0df01dc401fec111d64aa9eb0f

jonas-schievink · 2019-03-23T12:23:44Z

#[inline(always)] is respected even in unoptimized debug builds, you really don't want that on every derived PartialEq impl since it would slow down compile times for all programs that #[derive(PartialEq)].

Unconditionally putting #[inline] on all derived PartialEq impls also seems very suboptimal, since it's only rarely needed, so having a way of configuring it would be helpful (AFAIK no builtin derive currently has any configuration attributes).

Is there a reason you can't use opt-level 0 for development and opt-level 3 (aka Cargo's release profile) for benchmarks and tests? (I'm assuming you want this because your tests take a long time?)

jonas-schievink · 2019-03-23T12:26:17Z

Okay nevermind me, apparently it already puts #[inline] on the methods:

rust/src/libsyntax_ext/deriving/cmp/partial_eq.rs

Line 65 in d51a437

let inline = cx.meta_word(span, Symbol::intern("inline"));

tdaede · 2019-03-23T18:49:13Z

Yup, likewise the libcore implementations are also #[inline]. The implementation of partialeq is always just byte compare (usually 1 instruction), so I can't think of any case you would not want it inlined.

I found this bug by converting an integer field to an enum in the project, and suddenly seeing CI take over 2x the time (as it runs in debug mode). It actually makes it impractical to use an enum unless we use opt-level 3 for debug (and that inlines so aggressively otherwise that it's not desired for debugging).

tdaede · 2019-03-27T01:05:14Z

I tried changing the implementation to the following:

            let always = cx.meta_list_item_word(span, Symbol::intern("always"));
            let inline = cx.meta_list(span, Symbol::intern("inline"), vec![always]);

Unfortunately the resulting compiler really struggles to compile the syn crate - I'm not sure if it's just really slow or it's actually hung, I'll let it run overnight.

tdaede · 2019-03-27T08:40:55Z

I think this actually might be caused by neq, not eq. I seem to get faster runtimes if I also derive Eq - the problem is that without Eq, an implementation of neq is made that calls eq, which requires double inlining and the compiler fails to do this.

Mixed artifacts from two different compilers, deriving Eq doesn't solve the problem.

pnkfelix · 2019-03-29T15:02:58Z

It might be interesting to try to change the derive(PartialEq) to generate two methods: the first, marked #[inline(always)], just inspects the two discriminants (or for integers, the values themselves), and then dispatches to the second method. The second method, marked #[inline] (as today) would hold the remainder of the comparison...

dotdash · 2019-04-24T18:15:26Z

I played around with this and have a WIP branch (dotdash@a5dc7e3) that teaches the deriving code about trivial cases, so the generated code turns from:

    fn eq(&self, other: &RefType) -> bool {
        {
            let __self_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*self) } as
                    isize;
            let __arg_1_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*other) } as
                    isize;
            if true && __self_vi == __arg_1_vi {
                match (&*self, &*other) { _ => true, }
            } else { false }
        }
    }

into

    fn eq(&self, other: &RefType) -> bool {
        {
            let __self_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*self) } as
                    isize;
            let __arg_1_vi =
                unsafe { ::std::intrinsics::discriminant_value(&*other) } as
                    isize;
            if true && __self_vi == __arg_1_vi { true } else { false }
        }
    }

Which saves 32 bytes of stack in the eq function in this case (it depends on the repr type for the enum). Additionally, the derive for PartialEq emits #[inline(always)] when none of the enum variants have fields, which seems almost reasonable with the improved codegen.

But there's one problem left: the discriminant intrinsic. The codegen for intrinsics still requires a stack slot for the return value, even for immediate values. Thus both discriminants get spilled onto the stack for no good reason. If we could get that fixed we'd have good enough codegen to always inline these simple cases, even without optimizations.

Unfortunately the codegen code has changed quite a bit since I last touched it, so I didn't quite get that part done, and I'm not sure how long it might take.

dotdash · 2019-04-24T18:52:46Z

@tdaede could you try and see whether this helps with your code in rav1e?

tdaede · 2019-04-25T22:55:44Z

I've been trying to get it to reproduce, and now despite using the same compiler version and code, I see it suddenly decide to inline now :( I wonder if it's dependent on what codegen unit boundaries rustc decides to use at a given time? My smaller testcases in the bug still reproduce.

FWIW in rav1e I was using the following to try to reproduce it:
$ perf record cargo test --features=decode_test test_encode_decode::bitrate_aom
And then watching for eq to show up in the profile.

saethlin · 2022-11-12T20:11:00Z

https://godbolt.org/z/rKb5vYqWb

It looks to me like this was fixed in 1.59, the PartialEq implementation is now inlined at -Copt-level=1.

cc @alice-i-cecile

tdaede mentioned this issue Mar 23, 2019

Use optimization level 3 for dev builds. xiph/rav1e#1145

Closed

jonas-schievink added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 23, 2019

aweinstock314 mentioned this issue Nov 21, 2019

Consider supporting non-Copy scalar types. dimforge/nalgebra#282

Open

alice-i-cecile closed this as completed Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insufficient inlining of PartialEq::eq #59375

Insufficient inlining of PartialEq::eq #59375

tdaede commented Mar 23, 2019 •

edited by pnkfelix

Loading

jonas-schievink commented Mar 23, 2019

jonas-schievink commented Mar 23, 2019

tdaede commented Mar 23, 2019

tdaede commented Mar 27, 2019

tdaede commented Mar 27, 2019 •

edited

Loading

pnkfelix commented Mar 29, 2019

dotdash commented Apr 24, 2019

dotdash commented Apr 24, 2019

tdaede commented Apr 25, 2019

saethlin commented Nov 12, 2022

Insufficient inlining of PartialEq::eq #59375

Insufficient inlining of PartialEq::eq #59375

Comments

tdaede commented Mar 23, 2019 • edited by pnkfelix Loading

jonas-schievink commented Mar 23, 2019

jonas-schievink commented Mar 23, 2019

tdaede commented Mar 23, 2019

tdaede commented Mar 27, 2019

tdaede commented Mar 27, 2019 • edited Loading

pnkfelix commented Mar 29, 2019

dotdash commented Apr 24, 2019

dotdash commented Apr 24, 2019

tdaede commented Apr 25, 2019

saethlin commented Nov 12, 2022

tdaede commented Mar 23, 2019 •

edited by pnkfelix

Loading

tdaede commented Mar 27, 2019 •

edited

Loading