-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global ratelimiter, part 2: Any-typed RPCs, mappers, and stub handler #5817
Conversation
Pull Request Test Coverage Report for Build 018f177a-a26a-4e7b-9e03-fba9286cf461Details
💛 - Coveralls |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
# Any type: This API is purely for inter-Cadence-service use, so the Thrift equivalent is in the github.com/uber/cadence-idl repository. Design-wise: this has gone through several iterations, and nearly all top-level keys I can come up with are either totally insufficient or totally unnecessary for one ratelimit design or another, so I've leaned more and more towards "define nothing, allow passing anything". Obviously this is not super ergonomic, but as the ratelimiter is intended to be pluggable by outside implementations it cannot *wholly* be defined in internal protocol definitions. Implementers need to maintain in-process definitions on both ends of the protocol, and check type IDs before decoding. Generally speaking this is probably best done with either a shared protobuf definition (anywhere) or something schema-free like JSON. # Mappers As the RPC specs and the internal type do essentially no interpreting of their contents, the mappers are both trivial and complete. # Handler Fully wiring this up will come later, and at a very high level will require: - a `common/quota` wrapper to collect usage metrics in the "limiting" hosts (frontend) - periodically, send bulk `RatelimitUpdateRequest`s to "aggregating" hosts (history) - use the responses to update quotas as needed - dynamicconfig to enable/disable/shadow/roll-back all this All of which is essentially out of scope for this PR.
…rift (#168) This API is purely for inter-Cadence-service use, so the Protobuf equivalent is in the github.com/uber/cadence repository (where it is not randomly exposed to clients): cadence-workflow/cadence#5817 Design-wise: this has gone through several adjustment and experimenting rounds, and nearly all top-level keys I can come up with are either totally insufficient or totally unnecessary for one ratelimit design or another, so I've leaned more and more towards "define nothing, allow passing anything". In the extreme, this means: just one `Any` field. We can add more later if needed / if we discover some kind of truly universal data that is always worth including. Obviously this is not super ergonomic, but as the ratelimiter is intended to be pluggable by outside implementations it cannot *wholly* be defined in internal protocol definitions. There needs to be an extendable type of some kind, and arbitrarily requiring e.g. a `list<?>` or `map<string,?>` doesn't actually lend any beneficial semantics to the system, nor reduce space on the wire. Implementers will need to maintain in-process definitions on both ends of the protocol, and check type IDs before decoding. Generally speaking this is probably best done with either a shared protobuf-or-something definition (anywhere) or something schema-free like JSON, so cross-version communication can be done safely e.g. during server upgrades. # Intended use Eventually this will be part of a pluggable global-ratelimit-data-exchange system, for anything that can make use of the high level "periodically check in to sharded hosts, share data, update limits" pattern that github.com/uber/cadence/common/quotas/global is building up. The first version will end up using inside-`Any` structures like this: ```go type Request struct { Caller string `json:"caller"` Elapsed time.Duration `json:"elapsed"` // a compact map[ratelimit key][allowed, rejected] structure. // this could be a regular proto/thrift map<string, struct> instead // but that'd have lots of repeated field names in JSON Data map[string][2]int `json:"data"` } type Response struct { RPS map[ratelimit key]float64 `json:"rps"` } ``` which will probably just use JSON for simplicity for starters. Or I'll make a new proto struct and shove that in there instead, using proto-encoded data for both thrift and proto transports. It doesn't particularly matter. These request/response structures match the API defined in github.com/uber/cadence/common/quotas/global/algorithm/requestweighted.go , and there will be a fairly simple mapper to convert to/from the `Any` types defined here to make things work. # Why `Any` instead of strict types? Because: - Types we define in these internally-controlled APIs cannot be modified by anyone running Cadence without maintaining a full fork of multiple repos. - Rate-limiting details are assumed to be relatively unique to internal networking setups, and is therefore not something we intend to fix centrally in every possible variation that server-hosts encounter. So instead this is loosely typed, and the API will allow Cadence-wrapping developers to add wholly externally defined types as needed, and use DynamicConfig to shadow/enable/disable/etc between any registered algorithms. If your company needs to e.g. share custom info from Kubernetes to decide how many requests host X can allow (say it's based on number of cores allocated), you will have a relatively simple way to build that and safely try it out. # Why not `google.protobuf.Any`? Covered in code comments, but mostly: because that would _require_ using both protobuf _and_ the same protobuf code-generator we use, so the types can be registered and decoded automatically. That auto-decoding isn't necessary (nor even particularly useful) though, it can be deferred to the ratelimiter-plugin, and then it can be entirely encoding-agnostic. So that's what this does. It's not a `google.protobuf.Any` because it doesn't _have_ to be, and you're free to use it if you want.
@@ -363,3 +365,22 @@ func TestHistoryGetTaskInfoResponse(t *testing.T) { | |||
assert.Equal(t, item, ToHistoryGetFailoverInfoResponse(FromHistoryGetFailoverInfoResponse(item))) | |||
} | |||
} | |||
|
|||
func TestRatelimitUpdate(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason not to follow table based approach that is used in other tests in this file? They are much shorter and easier to add more cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be a moderate bit larger after accounting for the data in the other files.
but that's not really relevant here, this type probably shouldn't ever gain other fields.
those tests are also simpler because they're only round-trip checks, not "input X produces output Y". bug-for-bug compatibility between mappers won't be caught, nor missed fields (which is why I spent the time to figure out a fuzzing pattern too).
request *types.RatelimitUpdateRequest, | ||
) (*types.RatelimitUpdateResponse, error) { | ||
// TODO: wire up to real global-ratelimit aggregator | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return unimplemented err
networkBytes := encode(thriftAny) // yarpc does this | ||
// ^ this is what's sent over the network. | ||
|
||
// the final data is not double-encoded, so this "encode -> wrap -> encode" process is reasonably efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it not double-encoded? I may have missed that part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also via chat but for public purposes...)
semantically: it doesn't matter, no behavior depends on it.
practically: because thrift (and I'm pretty sure proto) store binary blobs as size, data...
so there's no need to e.g. escape \0
as \\0
or base64-encode it or other costly stuff. it's just a memcpy.
my main goal here is to point out that anything you'd do with normal thrift is reasonable to do when wrapped by this Any
-container because it doesn't meaningfully increase any of the runtime costs. push gigabytes / million-field things through it, it's entirely fine and no different than using the non-wrapped type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained this more in the test comments now btw
// DO NOT include, tools dependencies are intentionally separate. | ||
// ./internal/tools | ||
// DO NOT include, tools dependencies are intentionally separate. | ||
// ./internal/tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go work sync
on go 1.20 did this for some reason.
no idea why, but it wants it, and it seems harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, can't fault anything there
Docs are broadly the same as cadence-workflow/cadence-idl#168, but are repeated here for visibility with some additions at the end for stuff unique to this repo:
Any type:
This API is purely for inter-Cadence-service use, and the Thrift equivalent is in the github.com/uber/cadence-idl repository: cadence-workflow/cadence-idl#168
Design-wise: this has gone through several adjustment and experimenting rounds, and nearly all top-level keys I can come up with are either totally insufficient or totally unnecessary for one ratelimit design or another, so I've leaned more and more towards "define nothing, allow passing anything".
In the extreme, this means: just one
Any
field. We can add more later if needed / if we discover some kind of truly universal data that is always worth including.Obviously this is not super ergonomic, but as the ratelimiter is intended to be pluggable by outside implementations it cannot wholly be defined in internal protocol definitions. There needs to be an extendable type of some kind, and arbitrarily requiring e.g. a
list<?>
ormap<string,?>
doesn't actually lend any beneficial semantics to the system, nor reduce space on the wire.Implementers will need to maintain in-process definitions on both ends of the protocol, and check type IDs before decoding. Generally speaking this is probably best done with either a shared protobuf-or-something definition (anywhere) or something schema-free like JSON, so cross-version communication can be done safely e.g. during server upgrades.
Intended use
Eventually this will be part of a pluggable global-ratelimit-data-exchange system, for anything that can make use of the high level "periodically check in to sharded hosts, share data, update limits" pattern that github.com/uber/cadence/common/quotas/global is building up.
The first version will end up using inside-
Any
structures like this:which will probably just use JSON for simplicity for starters.
Or I'll make a new proto struct and shove that in there instead, using proto-encoded data for both thrift and proto transports.
It doesn't particularly matter.
These request/response structures match the API defined in github.com/uber/cadence/common/quotas/global/algorithm/requestweighted.go , and there will be a fairly simple mapper to convert to/from the
Any
types defined here to make things work.Why
Any
instead of strict types?Because:
So instead this is loosely typed, and the API will allow Cadence-wrapping developers to add wholly externally defined types as needed, and use DynamicConfig to shadow/enable/disable/etc between any registered algorithms. If your company needs to e.g. share custom info from Kubernetes to decide how many requests host X can allow (say it's based on number of cores allocated), you will have a relatively simple way to build that and safely try it out.
Why not
google.protobuf.Any
?Covered in code comments, but mostly: because that would require using both protobuf and the same protobuf code-generator we use, so the types can be registered and decoded automatically.
That auto-decoding isn't necessary (nor even particularly useful) though, it can be deferred to the ratelimiter-plugin, and then it can be entirely encoding-agnostic.
So that's what this does. It's not a
google.protobuf.Any
because it doesn't have to be, and you're free to put agoogle.protobuf.Any
inside yourtypes.Any
ifyo dawg I heard you like loose typingyou want to do so.Mappers
As the RPC specs and the internal type do essentially no interpreting of their contents, the mappers are both trivial and complete.
To actually use an
Any
requires custom code perValueType
, which will come in a later PR.I've written far more mapping-test-code than they really deserve, to try to figure out what some fuzzed round-trip tests might look like for this and other types.
As part of that, I discovered that https://github.com/google/gofuzz has some rather severe issues when generating protobuf types (e.g. it unavoidably panics if it finds any interface fields, which
oneof
use), and was recently archived, so I think we might need to come up with something else before going too much farther. Maybe just a fork?Handler
Fully wiring this up will come later, and at a very high level will require:
common/quota
wrapper to collect usage metrics in the "limiting" hosts (frontend), which will:RatelimitUpdateRequest
s to "aggregating" hosts (history)All of which is essentially out of scope for this PR.