Loops with inclusive ranges produce horrible assembly #75035

MSxDOS · 2020-08-02T03:28:42Z

This:

extern {
    fn f();
}

pub unsafe fn test_loop() {
    for _ in 1 ..= 7 { 
        f();
    }
}

on 1.42:

example::test_loop:
        push    rbx
        mov     rbx, qword ptr [rip + f@GOTPCREL]
        call    rbx
        call    rbx
        call    rbx
        call    rbx
        call    rbx
        call    rbx
        mov     rax, rbx
        pop     rbx
        jmp     rax

on 1.43 and above:

example::test_loop:
        push    rbp
        push    r15
        push    r14
        push    rbx
        push    rax
        mov     ebx, 1
        mov     r14d, 7
        mov     r15, qword ptr [rip + f@GOTPCREL]
.LBB0_1:
        lea     ebp, [rbx + 1]
        cmp     ebx, 7
        cmovge  ebp, r14d
        call    r15
        cmp     ebp, 7
        jg      .LBB0_3
        cmp     ebx, 7
        mov     ebx, ebp
        jl      .LBB0_1
.LBB0_3:
        add     rsp, 8
        pop     rbx
        pop     r14
        pop     r15
        pop     rbp
        ret

https://godbolt.org/z/n63seh

bugadani · 2020-08-02T07:31:32Z

Probably caused by #68835

As a workaround, (1..=7).for_each(|_| { f(); }) seems to be optimized.

leonardo-m · 2020-08-02T08:35:32Z

See also #75024

ds84182 · 2020-08-05T05:12:43Z

Seems like the following generates good assembly:

extern {
    fn f();
}

struct Inclusive(usize, Option<usize>);

impl Inclusive {
    fn new(start: usize, end: usize) -> Self {
        if start > end {
            Inclusive(start, None)
        } else {
            Inclusive(start, Some(end - start))
        }
    }
}

impl Iterator for Inclusive {
    type Item = usize;
    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if let Some(count) = self.1 {
            self.1 = count.checked_sub(1);
            let n = self.0;
            if count != 0 {
                self.0 = self.0 + 1;
            }
            Some(n)
        } else {
            None
        }
    }
}

pub unsafe fn test_loop() {
    for _ in Inclusive::new(1, 7) { 
        f();
    }
}

I don't think this matches the behavior of inclusive ranges 1:1, but it seems that using a current value + counter pair optimizes correctly.

bugadani · 2020-09-20T13:00:39Z

This problem is not there any more in 1.47 beta: https://godbolt.org/z/eTPETc

leonardo-m · 2020-09-20T16:35:40Z

This problem is not there any more in 1.47 beta

It needs a wider testing, with double and triple nested loops too.

MSxDOS · 2020-10-11T05:17:52Z

Still very bad with a nested loop: https://godbolt.org/z/djY1YW

bugadani · 2020-10-11T09:02:21Z

@ds84182

I don't think this matches the behavior of inclusive ranges 1:1, but it seems that using a current value + counter pair optimizes correctly.

Unfortunately, to do this, the values in the range must be at least PartialOrd, which the current Range API does not require.

The simplest solution I found, while keeping the type signatures the same is this, which actually performs worse than the current impl 🤔

MSxDOS · 2021-03-05T13:20:17Z

Unfortunately, the upgrade to LLVM 12 didn't fix the nested loop example, but made it even worse.

cc @nikic Could you have a look at this?

nikic · 2021-03-05T17:41:38Z

Cleaned up IR test case:

define void @test() {
start:
  br label %loop 

loop:                                            
  %iv = phi i32 [ 1, %start ], [ %iv.next, %loop ]
  %iv.inc = add nsw i32 %iv, 1
  tail call void @f()
  %cmp1 = icmp slt i32 %iv, 2
  %iv.next = select i1 %cmp1, i32 %iv.inc, i32 2
  %cmp2 = icmp slt i32 %iv.next, 3
  %and = and i1 %cmp1, %cmp2
  br i1 %and, label %loop, label %exit

exit:                                              
  ret void
}

declare void @f()

I believe CVP should be capable of inferring that %cmp2 = true, but it ends up with a range of constantrange<-2147483647, 4>, which is one element too large...

nikic · 2021-03-05T18:06:16Z

Something like this would do it: https://gist.github.com/nikic/1beaff28e66771d39ec949814428005b

nikic · 2021-03-06T09:46:52Z

Fixed upstream by llvm/llvm-project@a917fb8.

nikic · 2021-08-22T13:53:56Z

After #87570 the nested loop generates good code as well:

define void @_ZN5test29test_loop17he2725c5c3163608aE() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  tail call void @f()
  tail call void @f()
  tail call void @f()
  tail call void @f()
  tail call void @f()
  tail call void @f()
  ret void
}

Worth noting that inclusive range loop optimization is very hit and miss, but at least this particular case is handled well now.

leonardo-m · 2021-08-23T11:17:50Z

A problem isn't solved by the new LLVM. Is this worth reopening this issue or something?

fn euler76() -> u32 {
    const N: usize = 100; // Input.
    let mut ways = [0; N + 1];
    ways[0] = 1;
    for j in 1 .. N {
        //for i in j ..= N { // Adds a bound test.
        for i in j .. N + 1 {
            ways[i] += ways[i - j];
        }
    }
    ways[N]
}

fn main() { assert_eq!(euler76(), 190_569_291); }

berghetti · 2023-12-22T14:37:26Z

Problem remains, even for a simple case.
https://godbolt.org/z/K44MTqb7n

LingMan · 2023-12-22T16:11:35Z

@berghetti The original example still optimizes well. I don't think reusing this ticket for similar but separate issues will get them much attention, especially years after this one has been closed.

You'd better open a new report.

Btw, (0..=*num).for_each(|_| { i += *num; }); optimizes as expected.

rustbot added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Aug 2, 2020

nikic self-assigned this Mar 5, 2021

nikic closed this as completed Aug 22, 2021

matthieu-m mentioned this issue Jan 4, 2024

RFC: New range types for Edition 2024 rust-lang/rfcs#3550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loops with inclusive ranges produce horrible assembly #75035

Loops with inclusive ranges produce horrible assembly #75035

MSxDOS commented Aug 2, 2020

bugadani commented Aug 2, 2020 •

edited

Loading

leonardo-m commented Aug 2, 2020

ds84182 commented Aug 5, 2020

bugadani commented Sep 20, 2020

leonardo-m commented Sep 20, 2020

MSxDOS commented Oct 11, 2020

bugadani commented Oct 11, 2020 •

edited

Loading

MSxDOS commented Mar 5, 2021

nikic commented Mar 5, 2021

nikic commented Mar 5, 2021

nikic commented Mar 6, 2021

nikic commented Aug 22, 2021

leonardo-m commented Aug 23, 2021

berghetti commented Dec 22, 2023 •

edited

Loading

LingMan commented Dec 22, 2023

Loops with inclusive ranges produce horrible assembly #75035

Loops with inclusive ranges produce horrible assembly #75035

Comments

MSxDOS commented Aug 2, 2020

bugadani commented Aug 2, 2020 • edited Loading

leonardo-m commented Aug 2, 2020

ds84182 commented Aug 5, 2020

bugadani commented Sep 20, 2020

leonardo-m commented Sep 20, 2020

MSxDOS commented Oct 11, 2020

bugadani commented Oct 11, 2020 • edited Loading

MSxDOS commented Mar 5, 2021

nikic commented Mar 5, 2021

nikic commented Mar 5, 2021

nikic commented Mar 6, 2021

nikic commented Aug 22, 2021

leonardo-m commented Aug 23, 2021

berghetti commented Dec 22, 2023 • edited Loading

LingMan commented Dec 22, 2023

bugadani commented Aug 2, 2020 •

edited

Loading

bugadani commented Oct 11, 2020 •

edited

Loading

berghetti commented Dec 22, 2023 •

edited

Loading