Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Component Vector -> Map ECM Optimization #416

Merged
merged 14 commits into from
Nov 2, 2020

Conversation

JShep1
Copy link

@JShep1 JShep1 commented Oct 16, 2020

EDIT: The results shown below are, unfortunately, not as great as initially thought in the write up. See the comments below for an explanation.


Explanation and Reasoning

As found in the "Profile Ignition Gazebo Scenebroadcaster" task, the SceneBroadcaster::PostUpdate call accounts for roughly 60-65% of the simulation timestep running with a taxing example (shapes_population), the UpdateState portion accounts for about 80% of the PostUpdate call. Diving deeper reveals that the State function in the Entity Component Manager is the culprit.

So naturally, I began looking at the AddEntityToMessage function to see if there was potential for optimization. I ran some profiling on the function, segmenting out the function into 7 key sections - this optimization focuses on the findings from Section 1. The section itself seems unassuming, but after profiling, I found that it accounted for ~16% of the function's total time taken. The bool check and the map find are quick, but this continue case occurs roughly 85% of the time for an example like shapes_population.sdf. This is due to the fact that on every iteration, the pose component is updated across all entitys, but each entity has 7-9 components in this example, and an iteration is done through all of these components to find the pose component we want. This can become exceedingly costly as number of components per entity increases.

So @iche033 and I proposed that we change this vector of component keys to an unordered_map so that that iteration will not need to take place, and we will be able to access that component directly without needing to iterate.

This PR adds that functionality as well as updating some sets and maps to be unordered as they are provably faster, below are tables comparing the before and after. Note that Sections 2 - 7 can be disregarded, I used them as the baseline to find an equivalent timestep (if the before Sections 2 - 7 run times are roughly equivalent to the after Section 2 - 7 run times, we can more or less conclude that this was a similar timestep, since none of the profiled Section 2 - 7 code was changed and shouldn't be affected by the optimizations of Section 1). The following profiling samples are taken from running the shapes_population.sdf example.

Before Section 1 Optimizations

Name Time (ms) Self (ms) Calls Time per call (s)
UpdateState 47.374 0.006 1 47.374
Section 1 7.715 7.715 90142 8.56 x 10-5
Section 2 4.667 1.085 12017 38.84 x 10-5
Section 3 1.711 1.711 12017 14.24 x 10-5
Section 4 3.518 3.518 3003 117.15 x 10-5
Section 5 1.569 0.288 3003 52.25 x 10-5
Section 6 9.403 9.403 3003 313.12 x 10-5
Section 7 1.449 1.449 12020 12.05 x 10-5

After Section 1 Optimizations

Name Time (ms) Self (ms) Calls Time per call (s)
UpdateState 32.111 0.005 1 32.111
Section 1 0.436 0.436 12020 3.63 x 10-5
Section 2 4.527 0.929 12017 37.67 x 10-5
Section 3 1.629 1.629 12017 13.56 x 10-5
Section 4 3.677 3.677 3003 122.44 x 10-5
Section 5 1.616 0.317 3003 53.81 x 10-5
Section 6 9.408 9.408 3003 313.29 x 10-5
Section 7 1.204 1.204 12020 10.02 x 10-5

Results

As can be seen by the above data, in the case of the shapes_population example (and likely any other scene with a significant number of entities), Section 1 experiences about a 94.35% speedup from the previous implementation. Overall, there is about a 32.22% speedup of this UpdateState portion of the SceneBroadcaster. All other sections of the EntityComponentManager that previously looped through an entity's components now benefit from this optimization as well. I don't expect the improvements to be this drastic for smaller examples, but it should certainly still help.

Where to next

@iche033 and I discussed the remaining sections 2 - 7 for potential optimizations as well as parallelization potential and eventually a possible higher level architecture redesign.

John Shepherd added 4 commits October 14, 2020 12:22
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
@github-actions github-actions bot added the 🏰 citadel Ignition Citadel label Oct 16, 2020
@JShep1 JShep1 self-assigned this Oct 16, 2020
@JShep1 JShep1 added enhancement New feature or request 🏰 citadel Ignition Citadel and removed 🏰 citadel Ignition Citadel labels Oct 16, 2020
John Shepherd added 4 commits October 16, 2020 12:08
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
@JShep1 JShep1 marked this pull request as ready for review October 16, 2020 20:01
@JShep1 JShep1 requested a review from chapulina as a code owner October 16, 2020 20:01
@JShep1 JShep1 requested a review from iche033 October 16, 2020 20:01
@codecov
Copy link

codecov bot commented Oct 16, 2020

Codecov Report

Merging #416 into ign-gazebo3 will increase coverage by 0.02%.
The diff coverage is 96.82%.

Impacted file tree graph

@@               Coverage Diff               @@
##           ign-gazebo3     #416      +/-   ##
===============================================
+ Coverage        77.25%   77.27%   +0.02%     
===============================================
  Files              205      205              
  Lines            11018    11011       -7     
===============================================
- Hits              8512     8509       -3     
+ Misses            2506     2502       -4     
Impacted Files Coverage Δ
src/EntityComponentManager.cc 81.78% <96.82%> (-0.67%) ⬇️
src/SimulationRunner.cc 94.75% <0.00%> (+1.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4da55f8...e7acbc8. Read the comment docs.

@chapulina chapulina requested a review from nkoenig October 19, 2020 18:47
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reproduce the perf improvement using Remotery by looking at the SceneBroadcast::PostUpdate UpdateState timing in shapes_population.sdf world. But I'm currently getting roughly ~30ms both before and after the change. Not sure what I'm doing differently. Here's my remotery snapshot

scenebroadcaster_after

Also, do you see any noticeable difference in RTF? It's hovering around 0.5% for me.

src/EntityComponentManager.cc Outdated Show resolved Hide resolved
this->dataPtr->entityComponents[_entity].end(), _key) !=
this->dataPtr->entityComponents[_entity].end();
this->dataPtr->entityComponents[_entity].find(_key.first) !=
this->dataPtr->entityComponents[_entity].end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor optimization to remove duplicate lookups:

  if (!this->HasEntity(_entity))
    return false;
  auto &compMap = this->dataPtr->entityComponents[_entity];
  return compMap.find(_key.first) != compMap.end();

Copy link
Author

@JShep1 JShep1 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the updates here afee595 . Do you think we even need to call HasEntity here? I seem to remember it taking some time when I was profiling. The description for it states that it detects if the entity exists, can we forego the HasEntity and just check that the entity exists as a key within the entityComponent map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I kept it there without thinking too much about it. Looks like an extra safety check to make sure it also exists in the entity graph. Not sure how expensive that operation is.

@JShep1
Copy link
Author

JShep1 commented Oct 21, 2020

Also, do you see any noticeable difference in RTF? It's hovering around 0.5% for me.

Hmm, it seems Remotery has deceived me. After some investigation, I suppose that I saw such a drastic increase of performance given I made the incorrect assumption that the overhead caused by Remotery and IGN_PROFILE is not counted within the profiled time points. To test this theory, I profiled the same Section 1 mentioned above again. After 25 randomly collected samples, I found that the non-optimized UpdateState took on average 39.99 ms and the UpdateState with optimization took on average 28.36 ms. I removed those profiling lines and ran both branches regularly. I found that on average, the non-optimized UpdateState call took about 28.229 ms and the optimized UpdateState took about 26.726.

The drastic performance increase I saw came from the fact that the IGN_PROFILE was being called 90000 times in the non-optimized version and its equivalent in the optimized version was only called about 12000 times. Perhaps to get a better comparison, I should've calculated time per each Section 1 call. In that case, the time per call for the non-optimized Section 1 would be 8.317 x 10-5 ms and the optimized Section 1 would be 3.851 x 10-5 ms with the Remotery overhead included, for a time increase of 4.466 x 10-5 ms.

This brings me to three points/questions:

  1. Should we find a way to display and/or download the comprehensive data from a run with Remotery? It was a little of a time sink to have to pause 25 times and manually record the data. If we could just output some sort of csv with the data, it would go a long way.
    2.Should we remove profiling altogether or perhaps have a flag to enable it (or is there something like this already existing that I don't know about)? If the profiling is taking up runtime, this may introduce an easy point of optimization. I wouldn't expect it to hinder the runtime if it's only called once, but in cases where it's called thousands of times every iteration, it may be worth re-evaluating if we need that profiling point to be there or if we can disable it somehow.
  2. Is there a way to ignore the overhead of the Remotery calls?

So this isn't as great an optimization as I initially thought, but it's still a step in the right direction.

John Shepherd added 2 commits October 20, 2020 17:37
Signed-off-by: John Shepherd <[email protected]>
@mjcarroll
Copy link
Contributor

  1. Should we find a way to display and/or download the comprehensive data from a run with Remotery? It was a little of a time sink to have to pause 25 times and manually record the data. If we could just output some sort of csv with the data, it would go a long way.

This would be great, but is unsupported by Remotery. I opened an issue on ign-common, but haven't really looked into other alternatives: gazebosim/gz-common#85

One thing that would be nice would be to output in a format that Google Chrome's devtools can consume: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit

2.Should we remove profiling altogether or perhaps have a flag to enable it (or is there something like this already existing that I don't know about)? If the profiling is taking up runtime, this may introduce an easy point of optimization. I wouldn't expect it to hinder the runtime if it's only called once, but in cases where it's called thousands of times every iteration, it may be worth re-evaluating if we need that profiling point to be there or if we can disable it somehow.

This is primarily controlled via the IGN_PROFILER_ENABLE preprocessor directive. This will disable it at compile time, and each profiler call will be turned into a noop and compiled out.

  1. Is there a way to ignore the overhead of the Remotery calls?

Not really. It may ultimately be that you want to instrument one level higher. It sounds like the function that you were looking at was particularly "hot" and probably doesn't need per-cycle instrumentation, but rather an aggregate.

@mjcarroll
Copy link
Contributor

An alternative to using the ign-common profiler is to use something like perf and flamegraphs. I thought that I had written a tutorial on it, but I'm not recalling where:

This is more of a sampling based profiler, so it will be at the mercy of how many perf events you manage to catch, but it can also give you an idea of where the really hot spots are.

Finally, if you can get this down to a particularly pathological case, I would highly recommend using benchmark. There are some other examples here: https://github.com/ignitionrobotics/ign-gazebo/tree/main/test/benchmark

@JShep1 JShep1 requested a review from iche033 October 23, 2020 02:50
@JShep1
Copy link
Author

JShep1 commented Oct 23, 2020

@mjcarroll thanks for the detailed response. Yeah, I used flamegraphs awhile back and remember it being decent for profiling specific functions, perhaps I'll try it out again come next optimization PR. I'll take a closer look into possible alternative ways next week when I get some more time.

All this being said, I think this PR is ready to go and proven to at least be some sort of optimization although not as great as initially thought.

Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine. Too bad about the remotery overhead but good to know about this for future profiling work.

@JShep1
Copy link
Author

JShep1 commented Oct 28, 2020

@osrf-jenkins run tests please

1 similar comment
@scpeters
Copy link
Member

@osrf-jenkins run tests please

Copy link
Contributor

@chapulina chapulina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! The ABI checker is a false positive, so I'll merge even though it's failing.

@chapulina chapulina merged commit 1bb32b0 into ign-gazebo3 Nov 2, 2020
@chapulina chapulina deleted the jshep1/scenebroadcaster_optimizations branch November 2, 2020 20:17
doisyg pushed a commit to wyca-robotics/ign-gazebo that referenced this pull request Dec 13, 2020
Also use unordered types.

Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: Guillaume Doisy <[email protected]>
@chapulina chapulina added the performance Runtime performance label Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏰 citadel Ignition Citadel enhancement New feature or request performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants