Component Vector -> Map ECM Optimization #416

JShep1 · 2020-10-16T06:22:58Z

EDIT: The results shown below are, unfortunately, not as great as initially thought in the write up. See the comments below for an explanation.

Explanation and Reasoning

As found in the "Profile Ignition Gazebo Scenebroadcaster" task, the SceneBroadcaster::PostUpdate call accounts for roughly 60-65% of the simulation timestep running with a taxing example (shapes_population), the UpdateState portion accounts for about 80% of the PostUpdate call. Diving deeper reveals that the State function in the Entity Component Manager is the culprit.

So naturally, I began looking at the AddEntityToMessage function to see if there was potential for optimization. I ran some profiling on the function, segmenting out the function into 7 key sections - this optimization focuses on the findings from Section 1. The section itself seems unassuming, but after profiling, I found that it accounted for ~16% of the function's total time taken. The bool check and the map find are quick, but this continue case occurs roughly 85% of the time for an example like shapes_population.sdf. This is due to the fact that on every iteration, the pose component is updated across all entitys, but each entity has 7-9 components in this example, and an iteration is done through all of these components to find the pose component we want. This can become exceedingly costly as number of components per entity increases.

So @iche033 and I proposed that we change this vector of component keys to an unordered_map so that that iteration will not need to take place, and we will be able to access that component directly without needing to iterate.

This PR adds that functionality as well as updating some sets and maps to be unordered as they are provably faster, below are tables comparing the before and after. Note that Sections 2 - 7 can be disregarded, I used them as the baseline to find an equivalent timestep (if the before Sections 2 - 7 run times are roughly equivalent to the after Section 2 - 7 run times, we can more or less conclude that this was a similar timestep, since none of the profiled Section 2 - 7 code was changed and shouldn't be affected by the optimizations of Section 1). The following profiling samples are taken from running the shapes_population.sdf example.

Before Section 1 Optimizations

Name	Time (ms)	Self (ms)	Calls	Time per call (s)
UpdateState	47.374	0.006	1	47.374
Section 1	7.715	7.715	90142	8.56 x 10^-5
Section 2	4.667	1.085	12017	38.84 x 10^-5
Section 3	1.711	1.711	12017	14.24 x 10^-5
Section 4	3.518	3.518	3003	117.15 x 10^-5
Section 5	1.569	0.288	3003	52.25 x 10^-5
Section 6	9.403	9.403	3003	313.12 x 10^-5
Section 7	1.449	1.449	12020	12.05 x 10^-5

After Section 1 Optimizations

Name	Time (ms)	Self (ms)	Calls	Time per call (s)
UpdateState	32.111	0.005	1	32.111
Section 1	0.436	0.436	12020	3.63 x 10^-5
Section 2	4.527	0.929	12017	37.67 x 10^-5
Section 3	1.629	1.629	12017	13.56 x 10^-5
Section 4	3.677	3.677	3003	122.44 x 10^-5
Section 5	1.616	0.317	3003	53.81 x 10^-5
Section 6	9.408	9.408	3003	313.29 x 10^-5
Section 7	1.204	1.204	12020	10.02 x 10^-5

Results

As can be seen by the above data, in the case of the shapes_population example (and likely any other scene with a significant number of entities), Section 1 experiences about a 94.35% speedup from the previous implementation. Overall, there is about a 32.22% speedup of this UpdateState portion of the SceneBroadcaster. All other sections of the EntityComponentManager that previously looped through an entity's components now benefit from this optimization as well. I don't expect the improvements to be this drastic for smaller examples, but it should certainly still help.

Where to next

@iche033 and I discussed the remaining sections 2 - 7 for potential optimizations as well as parallelization potential and eventually a possible higher level architecture redesign.

Signed-off-by: John Shepherd <[email protected]>

codecov · 2020-10-16T20:03:47Z

Codecov Report

Merging #416 into ign-gazebo3 will increase coverage by 0.02%.
The diff coverage is 96.82%.

@@               Coverage Diff               @@
##           ign-gazebo3     #416      +/-   ##
===============================================
+ Coverage        77.25%   77.27%   +0.02%     
===============================================
  Files              205      205              
  Lines            11018    11011       -7     
===============================================
- Hits              8512     8509       -3     
+ Misses            2506     2502       -4

Impacted Files	Coverage Δ
src/EntityComponentManager.cc	`81.78% <96.82%> (-0.67%)`	⬇️
src/SimulationRunner.cc	`94.75% <0.00%> (+1.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4da55f8...e7acbc8. Read the comment docs.

iche033

I'm trying to reproduce the perf improvement using Remotery by looking at the SceneBroadcast::PostUpdate UpdateState timing in shapes_population.sdf world. But I'm currently getting roughly ~30ms both before and after the change. Not sure what I'm doing differently. Here's my remotery snapshot

Also, do you see any noticeable difference in RTF? It's hovering around 0.5% for me.

src/EntityComponentManager.cc

iche033 · 2020-10-20T18:44:45Z

src/EntityComponentManager.cc

-        this->dataPtr->entityComponents[_entity].end(), _key) !=
-    this->dataPtr->entityComponents[_entity].end();
+    this->dataPtr->entityComponents[_entity].find(_key.first) !=
+      this->dataPtr->entityComponents[_entity].end();


minor optimization to remove duplicate lookups:

if (!this->HasEntity(_entity)) return false; auto &compMap = this->dataPtr->entityComponents[_entity]; return compMap.find(_key.first) != compMap.end();

I made the updates here afee595 . Do you think we even need to call HasEntity here? I seem to remember it taking some time when I was profiling. The description for it states that it detects if the entity exists, can we forego the HasEntity and just check that the entity exists as a key within the entityComponent map?

I think so. I kept it there without thinking too much about it. Looks like an extra safety check to make sure it also exists in the entity graph. Not sure how expensive that operation is.

JShep1 · 2020-10-21T00:24:54Z

Also, do you see any noticeable difference in RTF? It's hovering around 0.5% for me.

Hmm, it seems Remotery has deceived me. After some investigation, I suppose that I saw such a drastic increase of performance given I made the incorrect assumption that the overhead caused by Remotery and IGN_PROFILE is not counted within the profiled time points. To test this theory, I profiled the same Section 1 mentioned above again. After 25 randomly collected samples, I found that the non-optimized UpdateState took on average 39.99 ms and the UpdateState with optimization took on average 28.36 ms. I removed those profiling lines and ran both branches regularly. I found that on average, the non-optimized UpdateState call took about 28.229 ms and the optimized UpdateState took about 26.726.

The drastic performance increase I saw came from the fact that the IGN_PROFILE was being called 90000 times in the non-optimized version and its equivalent in the optimized version was only called about 12000 times. Perhaps to get a better comparison, I should've calculated time per each Section 1 call. In that case, the time per call for the non-optimized Section 1 would be 8.317 x 10^-5 ms and the optimized Section 1 would be 3.851 x 10^-5 ms with the Remotery overhead included, for a time increase of 4.466 x 10^-5 ms.

This brings me to three points/questions:

Should we find a way to display and/or download the comprehensive data from a run with Remotery? It was a little of a time sink to have to pause 25 times and manually record the data. If we could just output some sort of csv with the data, it would go a long way.
2.Should we remove profiling altogether or perhaps have a flag to enable it (or is there something like this already existing that I don't know about)? If the profiling is taking up runtime, this may introduce an easy point of optimization. I wouldn't expect it to hinder the runtime if it's only called once, but in cases where it's called thousands of times every iteration, it may be worth re-evaluating if we need that profiling point to be there or if we can disable it somehow.
Is there a way to ignore the overhead of the Remotery calls?

So this isn't as great an optimization as I initially thought, but it's still a step in the right direction.

Signed-off-by: John Shepherd <[email protected]>

…b.com/ignitionrobotics/ign-gazebo into jshep1/scenebroadcaster_optimizations Signed-off-by: John Shepherd <[email protected]>

mjcarroll · 2020-10-21T16:05:02Z

Should we find a way to display and/or download the comprehensive data from a run with Remotery? It was a little of a time sink to have to pause 25 times and manually record the data. If we could just output some sort of csv with the data, it would go a long way.

This would be great, but is unsupported by Remotery. I opened an issue on ign-common, but haven't really looked into other alternatives: gazebosim/gz-common#85

One thing that would be nice would be to output in a format that Google Chrome's devtools can consume: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit

2.Should we remove profiling altogether or perhaps have a flag to enable it (or is there something like this already existing that I don't know about)? If the profiling is taking up runtime, this may introduce an easy point of optimization. I wouldn't expect it to hinder the runtime if it's only called once, but in cases where it's called thousands of times every iteration, it may be worth re-evaluating if we need that profiling point to be there or if we can disable it somehow.

This is primarily controlled via the IGN_PROFILER_ENABLE preprocessor directive. This will disable it at compile time, and each profiler call will be turned into a noop and compiled out.

Is there a way to ignore the overhead of the Remotery calls?

Not really. It may ultimately be that you want to instrument one level higher. It sounds like the function that you were looking at was particularly "hot" and probably doesn't need per-cycle instrumentation, but rather an aggregate.

mjcarroll · 2020-10-21T16:09:01Z

An alternative to using the ign-common profiler is to use something like perf and flamegraphs. I thought that I had written a tutorial on it, but I'm not recalling where:

https://github.com/brendangregg/FlameGraph

This is more of a sampling based profiler, so it will be at the mercy of how many perf events you manage to catch, but it can also give you an idea of where the really hot spots are.

Finally, if you can get this down to a particularly pathological case, I would highly recommend using benchmark. There are some other examples here: https://github.com/ignitionrobotics/ign-gazebo/tree/main/test/benchmark

JShep1 · 2020-10-23T02:51:24Z

@mjcarroll thanks for the detailed response. Yeah, I used flamegraphs awhile back and remember it being decent for profiling specific functions, perhaps I'll try it out again come next optimization PR. I'll take a closer look into possible alternative ways next week when I get some more time.

All this being said, I think this PR is ready to go and proven to at least be some sort of optimization although not as great as initially thought.

iche033

The changes look fine. Too bad about the remotery overhead but good to know about this for future profiling work.

JShep1 · 2020-10-28T17:20:54Z

@osrf-jenkins run tests please

scpeters · 2020-10-28T20:35:06Z

@osrf-jenkins run tests please

chapulina

Well done! The ABI checker is a false positive, so I'll merge even though it's failing.

Also use unordered types. Signed-off-by: John Shepherd <[email protected]> Signed-off-by: Guillaume Doisy <[email protected]>

John Shepherd added 4 commits October 14, 2020 12:22

draft

3904c21

Signed-off-by: John Shepherd <[email protected]>

vector to map

c526f78

Signed-off-by: John Shepherd <[email protected]>

Finish conversion

3976133

Signed-off-by: John Shepherd <[email protected]>

Finish implementation of optimization

67a3351

Signed-off-by: John Shepherd <[email protected]>

github-actions bot added the 🏰 citadel Ignition Citadel label Oct 16, 2020

JShep1 self-assigned this Oct 16, 2020

JShep1 added enhancement New feature or request 🏰 citadel Ignition Citadel and removed 🏰 citadel Ignition Citadel labels Oct 16, 2020

Merge branch 'ign-gazebo3' into jshep1/scenebroadcaster_optimizations

93cdb81

John Shepherd added 4 commits October 16, 2020 12:08

Clean up

b5e86b6

Signed-off-by: John Shepherd <[email protected]>

Clean up again

31812ff

Signed-off-by: John Shepherd <[email protected]>

Another pass

b74426d

Signed-off-by: John Shepherd <[email protected]>

Another one

5adc0c6

Signed-off-by: John Shepherd <[email protected]>

JShep1 marked this pull request as ready for review October 16, 2020 20:01

JShep1 requested a review from chapulina as a code owner October 16, 2020 20:01

JShep1 requested a review from iche033 October 16, 2020 20:01

chapulina requested a review from nkoenig October 19, 2020 18:47

iche033 reviewed Oct 20, 2020

View reviewed changes

Merge branch 'ign-gazebo3' into jshep1/scenebroadcaster_optimizations

8eda00c

John Shepherd added 2 commits October 20, 2020 17:37

requested fixes

afee595

Signed-off-by: John Shepherd <[email protected]>

Merge branch 'jshep1/scenebroadcaster_optimizations' of https://githu…

b521bed

…b.com/ignitionrobotics/ign-gazebo into jshep1/scenebroadcaster_optimizations Signed-off-by: John Shepherd <[email protected]>

Merge branch 'ign-gazebo3' into jshep1/scenebroadcaster_optimizations

95cf961

JShep1 requested a review from iche033 October 23, 2020 02:50

iche033 approved these changes Oct 26, 2020

View reviewed changes

Merge branch 'ign-gazebo3' into jshep1/scenebroadcaster_optimizations

e7acbc8

scpeters mentioned this pull request Oct 29, 2020

False positive in ABI checker gazebo-tooling/release-tools#336

Open

chapulina approved these changes Nov 2, 2020

View reviewed changes

chapulina merged commit 1bb32b0 into ign-gazebo3 Nov 2, 2020

chapulina deleted the jshep1/scenebroadcaster_optimizations branch November 2, 2020 20:17

doisyg pushed a commit to wyca-robotics/ign-gazebo that referenced this pull request Dec 13, 2020

Component Vector -> Map ECM Optimization (gazebosim#416)

17984d3

Also use unordered types. Signed-off-by: John Shepherd <[email protected]> Signed-off-by: Guillaume Doisy <[email protected]>

chapulina added the performance Runtime performance label Apr 9, 2021

chapulina mentioned this pull request Apr 9, 2021

Don't store duplicate ComponentTypeId in ECM #751

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component Vector -> Map ECM Optimization #416

Component Vector -> Map ECM Optimization #416

JShep1 commented Oct 16, 2020 •

edited

Loading

codecov bot commented Oct 16, 2020 •

edited

Loading

iche033 left a comment •

edited

Loading

iche033 Oct 20, 2020

JShep1 Oct 21, 2020 •

edited

Loading

iche033 Oct 26, 2020

JShep1 commented Oct 21, 2020

mjcarroll commented Oct 21, 2020

mjcarroll commented Oct 21, 2020

JShep1 commented Oct 23, 2020

iche033 left a comment

JShep1 commented Oct 28, 2020

scpeters commented Oct 28, 2020

chapulina left a comment

Component Vector -> Map ECM Optimization #416

Component Vector -> Map ECM Optimization #416

Conversation

JShep1 commented Oct 16, 2020 • edited Loading

Explanation and Reasoning

Before Section 1 Optimizations

After Section 1 Optimizations

Results

Where to next

codecov bot commented Oct 16, 2020 • edited Loading

Codecov Report

iche033 left a comment • edited Loading

Choose a reason for hiding this comment

iche033 Oct 20, 2020

Choose a reason for hiding this comment

JShep1 Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

iche033 Oct 26, 2020

Choose a reason for hiding this comment

JShep1 commented Oct 21, 2020

mjcarroll commented Oct 21, 2020

mjcarroll commented Oct 21, 2020

JShep1 commented Oct 23, 2020

iche033 left a comment

Choose a reason for hiding this comment

JShep1 commented Oct 28, 2020

scpeters commented Oct 28, 2020

chapulina left a comment

Choose a reason for hiding this comment

JShep1 commented Oct 16, 2020 •

edited

Loading

codecov bot commented Oct 16, 2020 •

edited

Loading

iche033 left a comment •

edited

Loading

JShep1 Oct 21, 2020 •

edited

Loading