-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component Vector -> Map ECM Optimization #416
Conversation
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Codecov Report
@@ Coverage Diff @@
## ign-gazebo3 #416 +/- ##
===============================================
+ Coverage 77.25% 77.27% +0.02%
===============================================
Files 205 205
Lines 11018 11011 -7
===============================================
- Hits 8512 8509 -3
+ Misses 2506 2502 -4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to reproduce the perf improvement using Remotery by looking at the SceneBroadcast::PostUpdate UpdateState
timing in shapes_population.sdf
world. But I'm currently getting roughly ~30ms both before and after the change. Not sure what I'm doing differently. Here's my remotery snapshot
Also, do you see any noticeable difference in RTF? It's hovering around 0.5% for me.
src/EntityComponentManager.cc
Outdated
this->dataPtr->entityComponents[_entity].end(), _key) != | ||
this->dataPtr->entityComponents[_entity].end(); | ||
this->dataPtr->entityComponents[_entity].find(_key.first) != | ||
this->dataPtr->entityComponents[_entity].end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor optimization to remove duplicate lookups:
if (!this->HasEntity(_entity))
return false;
auto &compMap = this->dataPtr->entityComponents[_entity];
return compMap.find(_key.first) != compMap.end();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the updates here afee595 . Do you think we even need to call HasEntity
here? I seem to remember it taking some time when I was profiling. The description for it states that it detects if the entity exists, can we forego the HasEntity
and just check that the entity exists as a key within the entityComponent
map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I kept it there without thinking too much about it. Looks like an extra safety check to make sure it also exists in the entity graph. Not sure how expensive that operation is.
Hmm, it seems Remotery has deceived me. After some investigation, I suppose that I saw such a drastic increase of performance given I made the incorrect assumption that the overhead caused by Remotery and The drastic performance increase I saw came from the fact that the This brings me to three points/questions:
So this isn't as great an optimization as I initially thought, but it's still a step in the right direction. |
Signed-off-by: John Shepherd <[email protected]>
…b.com/ignitionrobotics/ign-gazebo into jshep1/scenebroadcaster_optimizations Signed-off-by: John Shepherd <[email protected]>
This would be great, but is unsupported by Remotery. I opened an issue on ign-common, but haven't really looked into other alternatives: gazebosim/gz-common#85 One thing that would be nice would be to output in a format that Google Chrome's devtools can consume: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit
This is primarily controlled via the
Not really. It may ultimately be that you want to instrument one level higher. It sounds like the function that you were looking at was particularly "hot" and probably doesn't need per-cycle instrumentation, but rather an aggregate. |
An alternative to using the ign-common profiler is to use something like This is more of a sampling based profiler, so it will be at the mercy of how many perf events you manage to catch, but it can also give you an idea of where the really hot spots are. Finally, if you can get this down to a particularly pathological case, I would highly recommend using |
@mjcarroll thanks for the detailed response. Yeah, I used flamegraphs awhile back and remember it being decent for profiling specific functions, perhaps I'll try it out again come next optimization PR. I'll take a closer look into possible alternative ways next week when I get some more time. All this being said, I think this PR is ready to go and proven to at least be some sort of optimization although not as great as initially thought. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look fine. Too bad about the remotery overhead but good to know about this for future profiling work.
@osrf-jenkins run tests please |
1 similar comment
@osrf-jenkins run tests please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! The ABI checker is a false positive, so I'll merge even though it's failing.
Also use unordered types. Signed-off-by: John Shepherd <[email protected]> Signed-off-by: Guillaume Doisy <[email protected]>
EDIT: The results shown below are, unfortunately, not as great as initially thought in the write up. See the comments below for an explanation.
Explanation and Reasoning
As found in the "Profile Ignition Gazebo Scenebroadcaster" task, the
SceneBroadcaster::PostUpdate
call accounts for roughly 60-65% of the simulation timestep running with a taxing example (shapes_population), the UpdateState portion accounts for about 80% of the PostUpdate call. Diving deeper reveals that the State function in the Entity Component Manager is the culprit.So naturally, I began looking at the
AddEntityToMessage
function to see if there was potential for optimization. I ran some profiling on the function, segmenting out the function into 7 key sections - this optimization focuses on the findings from Section 1. The section itself seems unassuming, but after profiling, I found that it accounted for ~16% of the function's total time taken. The bool check and the map find are quick, but this continue case occurs roughly 85% of the time for an example likeshapes_population.sdf
. This is due to the fact that on every iteration, the pose component is updated across all entitys, but each entity has 7-9 components in this example, and an iteration is done through all of these components to find the pose component we want. This can become exceedingly costly as number of components per entity increases.So @iche033 and I proposed that we change this vector of component keys to an
unordered_map
so that that iteration will not need to take place, and we will be able to access that component directly without needing to iterate.This PR adds that functionality as well as updating some sets and maps to be unordered as they are provably faster, below are tables comparing the before and after. Note that Sections 2 - 7 can be disregarded, I used them as the baseline to find an equivalent timestep (if the before Sections 2 - 7 run times are roughly equivalent to the after Section 2 - 7 run times, we can more or less conclude that this was a similar timestep, since none of the profiled Section 2 - 7 code was changed and shouldn't be affected by the optimizations of Section 1). The following profiling samples are taken from running the
shapes_population.sdf
example.Before Section 1 Optimizations
After Section 1 Optimizations
Results
As can be seen by the above data, in the case of the
shapes_population
example (and likely any other scene with a significant number of entities), Section 1 experiences about a 94.35% speedup from the previous implementation. Overall, there is about a 32.22% speedup of thisUpdateState
portion of theSceneBroadcaster
. All other sections of theEntityComponentManager
that previously looped through an entity's components now benefit from this optimization as well. I don't expect the improvements to be this drastic for smaller examples, but it should certainly still help.Where to next
@iche033 and I discussed the remaining sections 2 - 7 for potential optimizations as well as parallelization potential and eventually a possible higher level architecture redesign.