Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for external calls for the workflow ID specific rate limits. #5684

Merged
merged 18 commits into from
Feb 28, 2024

Conversation

jakobht
Copy link
Member

@jakobht jakobht commented Feb 22, 2024

What changed?
Added a wrapper around the history handler which logs rate limit metrics the external calls we want to rate limit with the workflow specific rate limiter.

Why?
We want to investigate the rate limiting for all external calls that can cause hot shards, before we deploy.

How did you test it?
Unit tests and local tests

Potential risks

Release notes

Documentation Changes

This generates a rate limiter around the external calls in the
history hander, generated using this command:

.build/bin/gowrap gen -p ./service/history -i Handler -t ./service/history/templates/ratelimited.tmpl -o ./service/history/wrappers/ratelimited/handler_generated.go -v handler=History
When service and handler are in the same package we cannot introduce
wrappers around handler in another package as that will create a
circular dependency:
history (service) -> wrapper -> history (handler)

This moves handler to the handler package.

Also moves the engine implementation to it's own package as there was a
circular dependency

history (service) -> handler -> history (engine)

The engine implementation cannot be in the same package as the engine
interface, as there is a dependency
engineimpl -> shard -> engine (Interface)

Moved all the error values in handler to a constants package as at least
one of them (ErrRunIDNotValid) is referenced from two different packages
(handler and engine tests)
@coveralls
Copy link

coveralls commented Feb 27, 2024

Pull Request Test Coverage Report for Build 018deeed-f764-410d-be86-f1b4e7b1312f

Details

  • 0 of 15 (100.0%) changed or added relevant lines in 2 files are covered.
  • 44 unchanged lines in 12 files lost coverage.
  • Overall coverage increased (+0.05%) to 62.928%

Files with Coverage Reduction New Missed Lines %
service/history/execution/mutable_state_task_refresher.go 1 61.39%
common/persistence/historyManager.go 2 66.67%
common/task/fifo_task_scheduler.go 2 85.57%
common/util.go 2 91.69%
service/history/shard/context.go 2 67.3%
service/history/task/transfer_active_task_executor.go 2 72.38%
service/history/task/transfer_standby_task_executor.go 2 87.01%
tools/cli/admin_db_decode_thrift.go 2 70.13%
common/persistence/nosql/nosqlplugin/cassandra/workflow.go 3 61.93%
service/history/queue/timer_gate.go 3 95.83%
Totals Coverage Status
Change from base Build 018ded5d-859e-4be3-ac17-40fecfa03371: 0.05%
Covered Lines: 93010
Relevant Lines: 147805

💛 - Coveralls

Copy link
Member

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider using a dynaimc-config switch in the rate-limit to rollout rather than just emitting metrics and then adding a second follow-up PR here, as having two PRs might be fairly slow to rollout, but no concerns

@3vilhamster
Copy link
Contributor

+1 on David's point. Might be better to introduce a dry run mode and control it with dynamic config.

"ratelimitTypeUserPerID"
}}

{{ $workflowID := ( get ( get $ratelimitTypeMap $method.Name ) "workflowID" ) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call the fields with Field suffix:
$workflowIDField
So it is easier to read the code.

service/history/templates/ratelimited.tmpl Show resolved Hide resolved
@jakobht jakobht merged commit 6a39682 into cadence-workflow:master Feb 28, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants