Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update precombine benchmark to better represent varied workloads #24343

Merged
merged 1 commit into from
Nov 29, 2022

Conversation

lukecwik
Copy link
Member

@lukecwik lukecwik commented Nov 23, 2022

  1. Represent more data distributions (hot key, uniform, normal, unique)
  2. Run longer allowing the JIT to function
  3. Have a random ordering of data
  4. Use a blackhole to prevent to the JIT from optimizing away the data

Updated benchmark numbers are (note that I renamed the class before running):

Benchmark                       (distribution)  (globallyWindowed)   Mode  Cnt   Score   Error  Units
CombinerTableBenchmark.combine         uniform                true  thrpt   15  12.838 ± 0.314  ops/s
CombinerTableBenchmark.combine         uniform               false  thrpt   15   5.633 ± 0.283  ops/s
CombinerTableBenchmark.combine          normal                true  thrpt   15   6.869 ± 0.196  ops/s
CombinerTableBenchmark.combine          normal               false  thrpt   15   4.165 ± 0.271  ops/s
CombinerTableBenchmark.combine          hotKey                true  thrpt   15  13.697 ± 0.320  ops/s
CombinerTableBenchmark.combine          hotKey               false  thrpt   15   6.143 ± 0.458  ops/s
CombinerTableBenchmark.combine      uniqueKeys                true  thrpt   15   2.346 ± 0.063  ops/s
CombinerTableBenchmark.combine      uniqueKeys               false  thrpt   15   1.676 ± 0.055  ops/s

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

1. Represent more data distributions (hot key, uniform, normal, unique)
2. Run longer allowing the JIT to function
3. Have a random ordering of data
4. Use a blackhole to prevent to the JIT from optimizing away the data
@github-actions github-actions bot added the java label Nov 23, 2022
@lukecwik
Copy link
Member Author

R: @bhisevishal

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@lukecwik lukecwik changed the title Update precombine bencmark to better represent varied workloads Update precombine benchmark to better represent varied workloads Nov 23, 2022
@bhisevishal
Copy link
Contributor

Thanks @lukecwik Looks good. Is it possible to have muti threaded benchmark as well.

@lukecwik lukecwik merged commit 135007e into apache:master Nov 29, 2022
@lukecwik
Copy link
Member Author

Thanks @lukecwik Looks good. Is it possible to have muti threaded benchmark as well.

I'm not sure it will provide much value but you can always configure the benchmark with the additional flag -t 4 for running it concurrently with 4 threads or add the annotation @Threads(4) to the benchmark itself (example).

Note that this benchmark only checks the combiner table and doesn't represent a full transform graph.

ruslan-ikhsan pushed a commit to ruslan-ikhsan/beam that referenced this pull request Nov 30, 2022
…he#24343)

1. Represent more data distributions (hot key, uniform, normal, unique)
2. Run longer allowing the JIT to function
3. Have a random ordering of data
4. Use a blackhole to prevent to the JIT from optimizing away the data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants