Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing log playback performance #351

Merged
merged 8 commits into from
Sep 17, 2020

Conversation

adlarkin
Copy link
Contributor

@adlarkin adlarkin commented Sep 15, 2020

I've changed how log playback works in order to increase RTF and remove jumpy playback.

Previously, when setting updated entity poses in the log file as a PeriodicChange, log playback was jumpy. This was because the SceneBroadcaster updates at a different rate than when the log file is parsed (60 Hz), so, if log parsing didn't occur during the same cycle as a publication from the SceneBroadcaster, changed poses from the latest log parse call were not captured in the rendering update. In order to get around this, the updated poses were saved as a OneTimeChange.

The issue with using OneTimeChange is that playback performance is noticeably slower for worlds with a lot of entities. The reason for this is because whenever there is a OneTimeChange, the SceneBroadcaster will publish the full state, even if only one pose was changed.

In order to maintain the performance that comes with PeriodicChange while removing the jumpy playback that arises from different parsing and scene update rates, I cached the changed poses from the latest log parse call as a PeriodicChange. Then, whenever it is time for a scene update, only the cached pose updates are rendered instead of the full scene.

Below shows the RTF difference when playing back a log at regular speed for a SubT world, which has a lot of entities. The first video is before caching (OneTimeChange), and the second video uses caching (PeriodicChange). A few things to note:

  • The RTF varies significantly using OneTimeChange, when it should stay close to 100% since we omit physics during log playback.
  • The RTF stays close to 100% when using caching and PeriodicChange, which is expected for log playback at regular speed since we have the pose updates for every step without having to run any physics.

log_playbacl_before_update
log_playback_after_update

Here is an example of log playback at double speed with the simple double pendulum world. We still achieve smooth playback and keep an RTF close to the theoretical expectation of 200% for 2x playback speed (on my machine, at least).

dbl_pendulum_double_speed

Signed-off-by: Ashton Larkin [email protected]

@github-actions github-actions bot added the 📜 blueprint Ignition Blueprint label Sep 15, 2020
@adlarkin adlarkin marked this pull request as ready for review September 16, 2020 17:03
@adlarkin adlarkin requested a review from chapulina as a code owner September 16, 2020 17:03
@adlarkin adlarkin requested review from nkoenig and iche033 September 16, 2020 17:03
@adlarkin
Copy link
Contributor Author

After taking a look at the failed tests in Jenkins, I believe that the failed tests can be ignored for the following reasons:

  1. I copied the Docker setup from Jenkins over to my machine, and I could not reproduce the failed test from LogSystemTest.RecordAndPlayback. Perhaps the failure of this test in Jenkins had something to do with the CI machine?
  2. There was a segmentation fault for INTEGRATION_triggered_publisher.test_ran, which also appears to be problematic in previous builds (see build 4138).
  3. There was a test failure for DiffDriveTest.SkidPublishCmd, which also occurred in the previous build.

Considering that the DiffDrive test failure is an issue in other builds, I'm assuming that we can ignore the failure of this test in the GitHub action Ubuntu CI as well.

@chapulina @iche033 what do you guys think?

@adlarkin adlarkin force-pushed the adlarkin/log_playback_improvements branch from 852fc9e to 06ba47b Compare September 17, 2020 03:03
Signed-off-by: Nate Koenig <[email protected]>
@nkoenig
Copy link
Contributor

nkoenig commented Sep 17, 2020

I'm noticing some jerky playback. I tested with a log file that has five robot models. Can you confirm?

@nkoenig
Copy link
Contributor

nkoenig commented Sep 17, 2020

The problem appears to be on the GUI side. The GUI process is running at over 200% CPU, and the server is running at 48% while maintaining a constant RTF.

@adlarkin
Copy link
Contributor Author

adlarkin commented Sep 17, 2020

I'm noticing some jerky playback. I tested with a log file that has five robot models. Can you confirm?

@nkoenig @iche033 I believe the jerky playback has been re-introduced by commit b07ebdd.

We cannot call clear on every Update, because if we clear cached poses but never parse the log file on a given Update cycle (which can happen depending on which timestep we are on - sometimes, the batch received from the log file is empty), then we have no updated poses to share with the SceneBroadcaster, which means that no rendering update will occur until a future log file Parse call aligns with the next update from the SceneBroadcaster (recall that the SceneBroadcaster publishes changes at 60Hz, NOT at every PreUpdate -> Update -> PostUpdate step - so, log file parsing and rendering updates don't always line up).

So, we must keep the poses that are currently cached until we know that we have new poses to overwrite the cache with, or else we run the risk of losing track of recent changes since Parse is not called every Update cycle (we know that we have new poses to overwrite the cache with whenever Parse is called).

Does that make sense? If not, I can try to explain things another way, but essentially, we have to remember that the EntityComponentManager keeps track of changed entities for a single PreUpdate -> Update -> PostUpdate step only - so, in order to make sure that pose updates from a step at time t are captured in a rendering update that occurs at time t + dt, we must keep these updates cached and only clear them if we have something to replace them with since they are a PeriodicChange, not a OneTimeChange. By calling clear() on the cached poses each step, we are assuming that we always have new pose updates to replace the cache with, which isn't necessarily true.

@adlarkin
Copy link
Contributor Author

@nkoenig @iche033 as a follow up, I tested the code before and after commit b07ebdd, and noticed that jumpy playback occurs when calling clear every Update instead of in the first Parse call - so I think it's safe to say that jumpy playback is occurring because Parse isn't necessarily called every Update, as I described in my previous comment.

In order to achieve smooth playback, should we revert b07ebdd and go back to 06ba47b, or do either of you have any other preferences?

@adlarkin adlarkin mentioned this pull request Sep 17, 2020
@nkoenig
Copy link
Contributor

nkoenig commented Sep 17, 2020

@nkoenig @iche033 as a follow up, I tested the code before and after commit b07ebdd, and noticed that jumpy playback occurs when calling clear every Update instead of in the first Parse call - so I think it's safe to say that jumpy playback is occurring because Parse isn't necessarily called every Update, as I described in my previous comment.

In order to achieve smooth playback, should we revert b07ebdd and go back to 06ba47b, or do either of you have any other preferences?

Thanks for the explanation. That makes sense. I'll revert my changes.

Signed-off-by: Nate Koenig <[email protected]>
@nkoenig
Copy link
Contributor

nkoenig commented Sep 17, 2020

Reverted.

@iche033
Copy link
Contributor

iche033 commented Sep 17, 2020

changes look good. Waiting for one last round of CI builds

@nkoenig
Copy link
Contributor

nkoenig commented Sep 17, 2020

Can I get one more review of this?

Copy link
Contributor

@chapulina chapulina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @adlarkin ! Solves 2 problems at once 👍

@codecov
Copy link

codecov bot commented Sep 17, 2020

Codecov Report

Merging #351 into ign-gazebo2 will increase coverage by 24.20%.
The diff coverage is 76.01%.

Impacted file tree graph

@@               Coverage Diff                @@
##           ign-gazebo2     #351       +/-   ##
================================================
+ Coverage        53.78%   77.99%   +24.20%     
================================================
  Files              121      186       +65     
  Lines             5838    10184     +4346     
================================================
+ Hits              3140     7943     +4803     
+ Misses            2698     2241      -457     
Impacted Files Coverage Δ
include/ignition/gazebo/EntityComponentManager.hh 100.00% <ø> (ø)
include/ignition/gazebo/Server.hh 100.00% <ø> (ø)
include/ignition/gazebo/ServerConfig.hh 100.00% <ø> (ø)
...nclude/ignition/gazebo/components/Serialization.hh 61.29% <0.00%> (+61.29%) ⬆️
.../plugins/component_inspector/ComponentInspector.cc 9.41% <0.00%> (ø)
src/systems/breadcrumbs/Breadcrumbs.hh 100.00% <ø> (ø)
src/systems/diff_drive/DiffDrive.hh 100.00% <ø> (ø)
src/systems/joint_controller/JointController.hh 100.00% <ø> (ø)
...stems/joint_state_publisher/JointStatePublisher.hh 100.00% <ø> (ø)
src/systems/user_commands/UserCommands.hh 100.00% <ø> (ø)
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe7b726...7663a6d. Read the comment docs.

@nkoenig nkoenig merged commit b3a4719 into ign-gazebo2 Sep 17, 2020
@nkoenig nkoenig deleted the adlarkin/log_playback_improvements branch September 17, 2020 23:53
doisyg pushed a commit to wyca-robotics/ign-gazebo that referenced this pull request Dec 13, 2020
* enhanced log playback performance

Signed-off-by: Ashton Larkin <[email protected]>

* Handling multiple Parse(...) calls in a single Update

Signed-off-by: Ashton Larkin <[email protected]>

* Change clear behavior

Signed-off-by: Nate Koenig <[email protected]>

* Revert changes

Signed-off-by: Nate Koenig <[email protected]>

* explain cached pose clearing behavior

Signed-off-by: Ashton Larkin <[email protected]>

* fixed whitespacing for codecheck

Signed-off-by: Ashton Larkin <[email protected]>

Co-authored-by: Nate Koenig <[email protected]>
Co-authored-by: Ian Chen <[email protected]>
Co-authored-by: Louise Poubel <[email protected]>
Signed-off-by: Guillaume Doisy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📜 blueprint Ignition Blueprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants