Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ObligationForest's NodeState handling. #36993

Merged
merged 1 commit into from
Nov 3, 2016

Conversation

nnethercote
Copy link
Contributor

This commit does the following.

  • Changes NodeState from an enum to a bitflags. This makes it
    possible to check against multiple possible values in a single bitwise
    operation.
  • Replaces all the hot matches involving NodeState with if/else
    chains that ensure that cases are handled in the order of frequency.
  • Partially inlines two functions, find_cycles_from_node and
    mark_as_waiting_from, at two call sites in order to avoid function
    unnecessary function calls on hot paths.
  • Fully inlines and removes is_popped.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1

@arielb1
Copy link
Contributor

arielb1 commented Oct 7, 2016

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

@arielb1
Copy link
Contributor

arielb1 commented Oct 7, 2016

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

@nnethercote
Copy link
Contributor Author

That sounds like Totally The Wrong Way to handle an O(n^2) algorithm.

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

@nnethercote
Copy link
Contributor Author

For example, you should be able to get a better perf improvement by skipping over the entire middle "GC" steps if all obligations hit the Ok(None) case.

Can you elaborate more? Are you talking about process_obligations?

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2016

I found this comment to be curt and unhelpful. It made me feel angry and bad. It probably wasn't your intent, but that is the effect it had. Please think about how you express yourself when criticizing other people's code.

I'm sorry.

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

The main entry point of the obligation forest is process_obligations, and it is basically updates all obligations and then performs a "mark-and-sweep GC", where the running of the "finalizers" advances type inference. The mark-and-sweep GC takes O(n) time.

If nothing changes in the process_obligation loop, we can skip GCing altogether. Also, because cycles are rare (and shouldn't be present in the inflate example), we may want to use some hybrid of garbage collection and reference counting to avoid the sweeps.

@nnethercote
Copy link
Contributor Author

I just think that the best way to deal with an O(n^2) case is to well, make it stop being O(n^2), rather than adding low-level optimizations to the specific O(n^2) part.

To be clear: this is a reasonable opinion. The problem was not that you criticized my code, the problem was the way you criticized it. I suggest you avoid using the phrase "Totally The Wrong Way" in future reviews. Also, when suggesting alternative approaches, please describe them in reasonable detail. (Your follow-up comment about GCing was much better in this regard.)

@Manishearth
Copy link
Member

Manishearth commented Oct 16, 2016

Mod note: As Nick said, please be constructive and nice in your criticism. It seems like you've already acknowledged that, but just in case, leaving this comment 😄

@zatherz
Copy link

zatherz commented Oct 16, 2016

Why have my constructive and nice comments been removed? I'm very well acquainted with Rust's code of conduct and I think @nnethercote's reply could very well be considered harassment. He tried to make @arielb1 feel bad for his reply. We shouldn't reply to harassment with harassment. I am so angry that I'm considering writing a Medium blog post about it and posting it to HackerNews.

@Manishearth
Copy link
Member

Manishearth commented Oct 17, 2016

Mod note: @zatherz Your comment was deleted for containing profanity and blatantly violating the code of conduct. If you have further questions about moderation or the code of conduct, please ask [email protected] or on one of the forums. This is not the place for such discussion [further offtopic comments will be deleted]

@aturon
Copy link
Member

aturon commented Oct 17, 2016

cc @nikomatsakis

@BurntSushi
Copy link
Member

Moderator note: As @Manishearth said, if you have a specific question or objection with our moderation, then please ask [email protected]. This isn't the place for it.

@nikomatsakis nikomatsakis self-assigned this Oct 21, 2016
@nikomatsakis
Copy link
Contributor

Adding myself as another assigneee. Sorry I haven't taken a look yet. My initial reaction was somewhat similar to @arielb1 (i.e., I'd rather solve this from the top), but on the other hand the speed improvements that @nnethercote showed are promising! I'd like to give this a closer read. Will try to do ASAP! Apologies for the delay @nnethercote!

@nnethercote
Copy link
Contributor Author

Adding myself as another assigneee

Oh, please wait a bit! #37231 landed recently and that changed the profiles significantly, so I've started reworking this PR but it's not ready yet.

@nikomatsakis
Copy link
Contributor

@nnethercote ok ping me

@nnethercote
Copy link
Contributor Author

nnethercote commented Oct 24, 2016

I've updated. The commit no longer converts NodeState to bitflags, which was probably the most controversial part of the change. Due to #37231 this is now only a 2% win.

r? @nikomatsakis

It feels like there is still plenty of room for improvement in this benchmark, though I'm struggling to find those improvements myself. Here's Cachegrind's summary of the hottest functions (measurements are instruction counts).

1,743,729,697  src/librustc/traits/fulfill.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
1,470,943,055  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
1,291,117,019  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
1,098,926,283  src/librustc/infer/mod.rs:rustc::infer::InferCtxt::shallow_resolve
  982,723,014  src/libcore/result.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  819,129,999  src/libcore/iter/iterator.rs:<rustc::traits::fulfill::FulfillProcessor<'a, 'b, 'gcx, 'tcx> as rustc_data_structures::obligation_forest::ObligationProcessor>::process_obligation
  608,066,958  src/librustc_data_structures/obligation_forest/mod.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress
  465,370,000  /build/glibc-GKVZIf/glibc-2.23/string/::/sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
  446,992,304  /build/glibc-GKVZIf/glibc-2.23/malloc/malloc.c:_int_malloc
  433,218,885  src/librustc_data_structures/unify/mod.rs:<rustc_data_structures::unify::UnificationTable<K>>::get
  397,811,572  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::process_obligations
  397,030,502  ???:???
  368,654,892  src/libcore/option.rs:rustc::infer::InferCtxt::shallow_resolve
  321,693,196  src/libcore/iter/range.rs:<rustc_data_structures::obligation_forest::ObligationForest<O>>::compress

You have to squint a bit when interpreting them due to inlining but they give you a good idea. In particular, this call chain is very hot:

process_obligations -> process_obligation -> process_predicate -> shallow_resolve -> UnificationTable::probe -> UnificationTable::get

Any changes that can speed up any of those functions or reduce the number of calls in this chain are likely to help.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with the helper extracted (maybe tagged #[inline]?)


for dependent in &node.dependents {
self.mark_as_waiting_from(&self.nodes[dependent.get()]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than making two copies of this code, maybe pull out a helper like mark_neighbors_as_waiting_from(node)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.

It also fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.
@nnethercote
Copy link
Contributor Author

Comments addressed.

@aturon
Copy link
Member

aturon commented Nov 2, 2016

@bors r=nikomatsakis

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 2, 2016

📌 Commit 7b33f7e has been approved by nikomatsakis

@nikomatsakis
Copy link
Contributor

Not sure why @bors wasn't listening to @aturon. =)

@bors
Copy link
Contributor

bors commented Nov 3, 2016

⌛ Testing commit 7b33f7e with merge f9f45c6...

bors added a commit that referenced this pull request Nov 3, 2016
Optimize ObligationForest's NodeState handling.

This commit does the following.
- Changes `NodeState` from an enum to a `bitflags`. This makes it
  possible to check against multiple possible values in a single bitwise
  operation.
- Replaces all the hot `match`es involving `NodeState` with `if`/`else`
  chains that ensure that cases are handled in the order of frequency.
- Partially inlines two functions, `find_cycles_from_node` and
  `mark_as_waiting_from`, at two call sites in order to avoid function
  unnecessary function calls on hot paths.
- Fully inlines and removes `is_popped`.

These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 7% when
doing debug builds with a stage1 compiler.

r? @arielb1
@bors bors merged commit 7b33f7e into rust-lang:master Nov 3, 2016
@nnethercote nnethercote deleted the obligation branch November 3, 2016 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants