-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global ratelimiter: everything else #6141
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
... and 53 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Pull Request Test Coverage Report for Build 019037db-5969-4cb6-a83d-760851588f21Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 01904c1d-8ac5-47d2-8f14-b71d02715363Details
💛 - Coveralls |
"github.com/uber/cadence/common/types" | ||
) | ||
|
||
func TestMapping(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: What about round-trip tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they only map one way, no round trips involved.
more concretely: I believe our rpc-type-round-trip tests are for things like replication and queries, where the frontend both receives a request and sends that same request out to history/matching, possibly on a different protocol. that isn't happening in this code.
return &client{ | ||
history: historyClient, | ||
resolver: resolver, | ||
thisHost: uuid.NewString(), // TODO: would descriptive be better? but it works, unique ensures correctness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what the value of a random string is there over a host/container instance name? I find this a little odd, do we not have a way to get runtime identity for our other clients?
I don't feel super strongly, but I would just thing that you'd want consitency between service restarts (with the obv caveat that 'restart' doesn't make a lot of sense in a containerized world, but that's a different issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not have a way to get runtime identity for our other clients?
I'm honestly not sure. I couldn't find anything conclusive or even describing an attempt at a per-process unique ID.
I suspect there is one in the membership/peer/etc ringpop stuff, but I don't know the intricacies there well enough to figure out what it would be.
re container restarts: it is potentially fine since that will potentially result in the new process receiving the same pattern of requests it did before...... but there really isn't any way to know if that's true in general.
though the consequences either way are quite minor, and look pretty similar whether the host-aggregated-data is reused or lost. the only bit that's actually important is that two active hosts never choose the same value, because they won't weight fairly.
that said: this can be changed at any time, since it'll also change on every deploy. so if we do find a nicer value, it'll be trivial to adopt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion you can use the hostname that's available in service resource object. Similar to uuid that hostname will change with restarts/deployments which is fine. it's just a unique identifier of the service instance.
Existing implementation lacks debug logs but when/if we introduce them it would be nice to avoid a random uuid and see hostname there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually see that value being set anywhere at all 🤔 it's read, but never assigned.
If it's literally the host name, like os.Hostname()
, that's not good enough because it would mean multiple instances on the same machine share the same name. That's somewhat common in a dockerized environment, as well as dev/CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to stick IRL chat in here:
gonna stick with UUID for now. but I agree something from the ring would probably be preferable, if we can find something that is truly unique and stable.
there should be something in there, our rings depend on it, but I'm not confident that it's host:port
(though that is currently unique internally) instead of some other field we might not currently be exposing. if/when we find it, switching makes plenty of sense as it should be a more shared / identifiable value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For users who're not particularly interested in this problem, who'll not attempt to roll out flipr config for the global rate-limit feature:
- Are there any meaningful changes they should know about
- Directionally, can they just do nothing and it'll remain-as-is for them?
Pull Request Test Coverage Report for Build 01905ac0-75d9-412d-be38-63dbc29251eaDetails
💛 - Coveralls |
Pull Request Test Coverage Report for Build 01905aec-2f03-4488-81f1-7aff8cdf3c00Details
💛 - Coveralls |
Added deployment steps near the top of the commit message. Look good? |
) Mostly-prerequisite for the final major step of building the global ratelimiter system in cadence-workflow/cadence#6141 This Thrift addition _does not_ need to be done, the system could instead exchange Protobuf / gob / JSON data. But I've done it in Thrift because: 1. We already use Thrift rather heavily in service code, for long-term-stable data, like many database types. 2. We do not use Protobuf like ^ this _anywhere_. This PR could begin to change that, but I feel like that has some larger ramifications to discuss before leaping for it. 3. Gob is _significantly_ larger than Thrift, and no more human-readable than Thrift or Protobuf, and it doesn't offer quite as strong protection against unintended changes (IDL files/codegen make that "must be stable" contract very explicit). Notes otherwise include: - i32 because more than 2 million operations within an update cycle (~3s planned) on a single host is roughly 1,000x beyond the size of ALL of our current ratelimits, and it uses half of the space of an i64. - To avoid roll-around issues even if this happens, the service code saturates at max-i32 rather than rolling around. We'll just lose precise weight information across beyond-2m hosts if that happens. - `double` is returned because it's scale-agnostic and not particularly worth squeezing further, and it allows the aggregator to be completely target-RPS-agnostic (it doesn't need limiters or that dynamic config _at all_ as it just tracks weight). - This could be adjusted to a... pair of ints? Local/global RPS used, so callers can determine their weight locally? I'm not sure if that'd be clearer or more useful, but it's an option, especially since I don't think we care about accurately tracking <1RPS (so ints are fine). - If we decide we care a lot about data size, key strings are by far the majority of the bytes. There are a lot of key-compaction options (most simply: a map per collection name), we could experiment a bit. And last but not least, if we change our mind and want to move away from Thrift here: we just need to make a new `any.ValueType` string to identify that new format, and maintain this thrift impl for as long as we want to allow transparent server upgrades. And when we remove it, if someone still hasn't upgraded yet, they'll just fall back to local-only behavior (which is what we have used for the past several years) until the deploy finishes. Risk is extremely low.
Pull Request Test Coverage Report for Build 01905b7c-fcd3-46a3-9e31-6318be207dbcDetails
💛 - Coveralls |
func (s *configSuite) SetupSuite() { | ||
func (s *configSuite) SetupTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key changes were leaking between tests. doesn't seem like that's even remotely desired in these, so now each test gets a new value.
Pull Request Test Coverage Report for Build 01905bc1-d3c5-43ef-b8b1-17df18d6a6daDetails
💛 - Coveralls |
// unknown filter string is likely safe to change and then should be updated here, but otherwise this ensures the logic isn't entirely position-dependent. | ||
require.Equalf(t, "unknownFilter", filterString, "expected first filter to be 'unknownFilter', but it was %v", filterString) | ||
} else { | ||
assert.NotEqualf(t, UnknownFilter, ParseFilter(filterString), "failed to parse filter: %s, make sure it is in ParseFilter's switch statement", filterString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a weak validation that only checks it's not parsed as UnknownFilter but better than nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a test that checks that the mapping is what we expect would just be a re-implementation of the func itself :\ though I can check that it's unique, since nothing enforces that currently.
I think this would all probably be better done with a map rather than a slice, so the pairing can be built up in a single hardcoded location, I just kinda don't want to make major changes here in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a uniqueness check too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair. without changing filters
into a map I don't see a future proof way. it can be handled separately since this PR is quite big already
Pull Request Test Coverage Report for Build 019060c6-2bde-406c-bf4d-55ee12eb19feDetails
💛 - Coveralls |
…adence-workflow#172) Mostly-prerequisite for the final major step of building the global ratelimiter system in cadence-workflow/cadence#6141 This Thrift addition _does not_ need to be done, the system could instead exchange Protobuf / gob / JSON data. But I've done it in Thrift because: 1. We already use Thrift rather heavily in service code, for long-term-stable data, like many database types. 2. We do not use Protobuf like ^ this _anywhere_. This PR could begin to change that, but I feel like that has some larger ramifications to discuss before leaping for it. 3. Gob is _significantly_ larger than Thrift, and no more human-readable than Thrift or Protobuf, and it doesn't offer quite as strong protection against unintended changes (IDL files/codegen make that "must be stable" contract very explicit). Notes otherwise include: - i32 because more than 2 million operations within an update cycle (~3s planned) on a single host is roughly 1,000x beyond the size of ALL of our current ratelimits, and it uses half of the space of an i64. - To avoid roll-around issues even if this happens, the service code saturates at max-i32 rather than rolling around. We'll just lose precise weight information across beyond-2m hosts if that happens. - `double` is returned because it's scale-agnostic and not particularly worth squeezing further, and it allows the aggregator to be completely target-RPS-agnostic (it doesn't need limiters or that dynamic config _at all_ as it just tracks weight). - This could be adjusted to a... pair of ints? Local/global RPS used, so callers can determine their weight locally? I'm not sure if that'd be clearer or more useful, but it's an option, especially since I don't think we care about accurately tracking <1RPS (so ints are fine). - If we decide we care a lot about data size, key strings are by far the majority of the bytes. There are a lot of key-compaction options (most simply: a map per collection name), we could experiment a bit. And last but not least, if we change our mind and want to move away from Thrift here: we just need to make a new `any.ValueType` string to identify that new format, and maintain this thrift impl for as long as we want to allow transparent server upgrades. And when we remove it, if someone still hasn't upgraded yet, they'll just fall back to local-only behavior (which is what we have used for the past several years) until the deploy finishes. Risk is extremely low.
After too many attempts to break this apart and build different portions in self-contained ways, and running into various inter-dependent roadblocks... I just gave up and did it all at once.
Rollout plan for people who don't want or need this system
Do nothing :)
As of this PR, you'll use "disabled" and that should be as close to "no changes at all" as possible.
Soon, you'll get "local", and then you'll have some new metrics you can use (or ignore) but otherwise no behavior changes.
And that'll be it. The "global" load-balanced stuff is likely to remain opt-in.
Rollout plan for us
For deployment: any order is fine / should not behave (too) badly. Even if "global" or either shadow mode is selected on the initial deploy. Frontends will have background
RatelimitUpdate
request failures until History is deployed, but that'll just mean it continues to use the "local" internal fallback and that's in practice the same behavior as "local" or "disabled", just slightly noisier.The smoothest deployment is: deploy everything on "disabled" or "local" (the default(s), so no requests are sent until deploy is done), then switch to "local-shadow-global" to warm global limiters / check that it's working, then "global" to use the global behavior.
Rolling back is just the opposite. Ideally disable things first to stop the requests, but even if you don't it should be fine.
In more detail:
frontend.globalRatelimiterMode
) to "disabled", which gets as close as is reasonably possible to acting exactly like it did before this PR."global"
and lowering their RPS back to where we intend, rather than their current artificially-raised-to-mitigate-load-imbalance values.frontend.globalRatelimiterMode
return "global" for keys like.*:my-domain
(to catchuser:my-domain
,worker:my-domain
, etc).constraints: {ratelimitKey: "user:my-domain"}
The changes in a nutshell
(... I guess it's a coconut, given the size)
This PR includes:
quotas.Collection
usage (and are used as a drop-in replacement, though this needed a change to use interfaces).shared.GlobalKey
s are namespaced).disabled
(old code fallthrough),local
(old code with metrics),global
, orx-shadow-y
to use x while shadowing y.shared.GlobalKey
), so in practice we will see things likeuser:domain
<- this is the limiter for "user" requests for that domain, e.g. StartWorkflow RPS. This allows us to roll this out per domain (suffix matches or just compare against all 4 values) and/or per type (prefix matches), so we can adjust to surprises reasonably precisely.quotas
-related code to the newclock.Ratelimiter
APIs as much as possible, which allows some simple wrappers and sharing more logic with otherquotas
package code.rpc
package, and that drives some of this setup, but I'm pretty sure that doesn't make sense with a full plugin-friendly system. So this will almost certainly be moved later.Testing
Aside from the unit tests here, I've locally run all this with the new
development_instance2.yaml
file, made some domains / sent some requests, watched where requests went / how weights changed / when GC occurred / etc. After some bug fixes and the "GC locally after 5 idle periods" change, it seems to be doing exactly what I want it to do, including adjusting as I start and stop the extra instance(s).I would like to build a multi-instance cluster test (or a docker-compose.yaml at the very least) for a variety of kinds of tests, but I wasn't able to find anything that looked promising to build off, and I didn't want to spend a week figuring one out from scratch :\ I'm open to trying if someone has concrete ideas though.
Future changes, roughly in priority order
RatelimitUpdate
for a migrated key will receive all the weight, which is both unfair and may allow exceeding the target RPS. Having aggregators not return data until [update interval] or similar has passed since the first update may be enough to resolve this.golang.org/x/time/rate
needs a PR for its flawed locking.clock.Ratelimiter
likely should not change at all.x/time/rate
will likely still allow time-rewinding, and we'll still need to wrap it and controlreservation.CancelAt
calls / for mocking, and that'll need essentially everything that's currently there.github.com/jonboulle/clockwork
is tough to use with time.Tickers and contexts, and that seems fix-able.ticker.OneShotChan()
API would let us know when a "receive tick -> do something -> go back to waiting" cycle completes, rather than having to sleep and hope it's long enough. Currently we have no real way to work around this.clock.WithTimeout(ctx, dur)
and similar seems rather obviously needed in retrospect, LOTS of time-based stuff uses context timeouts. I have a prototype built but I'm not confident that it's "good enough" to serve as a precise replacement, and we'd need to do something to ensure prod costs are either low enough to accept, or start using build tags to exclude it from prod entirely.membership.Peer
arg tohistory/client.RatelimitUpdate
seems ideal, and is hopefully not too difficult.ValueType
", and some changes to therpc
package to make it generic.