-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNM: Try to reproduce 8798 #30811
base: main
Are you sure you want to change the base?
DNM: Try to reproduce 8798 #30811
Conversation
Reopening so I can iterate on this via CI. So far I've rebased and pushed up a change to run the failing 0dt test 20x during the test PR build (and no other tests). |
5254bda
to
3f847fc
Compare
This is not a fix for a bug that we observed, but I think it's still incorrect to check using these combined `(ts, subtime)` timestamps. I believe there is a general problem with the subtime approach, in a way the outside code is "lying" to the operators in side the subtime scope. A pattern in timely operators is to stash updates of a given timestamp `t` until we know that we have seen all updates for that timestamp, at which time we process them all together. The way operators usually check that timestamp `t` is ready for processing is something like `!input_frontier.less_equal(t)`. Within the Subtime scope this is broken. For example, say we have updates at timestamp `(5, Subtime(0))`. Now the input frontier advances to `(5, Subtime(1))`. The operator will now naively assume that updates at time `(5, Subtime(0))` are ready for processing, but in fact we haven't yet seen all the updates of the "full" outer timestamp `5`. It feels incorrect that the operator has to know about this special timestamp and have special "unwrapping" code that needs to be applied before checking for readiness.
Before, while working off the "upsert commands" (aka. input) we were taking the existing value out of `command_state` when processing and we were only putting in a new value when processing commands with `DrainStyle::AtTime`. This is problematic when processing commands of multiple timestamps in one invocation of `drain_staged_input`: processing the first update for a key `k` at timestamp `t` takes away any value that we might have retrieved from state, then when processing another update for key `k` at timestamp `t+1`, we think there is no previous value and don't emit a retraction. Meaning we suddely have multiple values for the same key in our collection, which is not legal. Now, we just don't take away the state value without putting something back. The astute reader might now wonder how this can be correct. Say our upsert state contains: ``` k1 -> v2 ``` The input frontier is at at a time that allows processing and the persist input frontier is at `[11]`. We're processing these upsert commands, tuples of `(key, timestamp, value)`: ``` (k1, 10, v2) (k1, 11, v3) ``` Naively (and somewhat correctly), you would assume the correct updates to emit when processing this, one timestamp after another, would be (now additionally with diffs): ``` ((k1, 10, v2), -1) ((k1, 10, v2), 1) ((k1, 11, v2), -1) ((k1, 11, v3), 1) ``` (Those first two updates might seem nonsensical, but the code doesn't actually check to see if the values differ, it simply retracts and updates.) The answer lies in the fact that we're _only_ allowed to process updates with timestamp `11` when they are not beyond the persist frontier anymore, that is the frontier is at say `11`. Meaning what we have in state is the _global view_ of upsert state as of that persist frontier, meaning it is correct to emit updates at `11`. And anything we emit at "lower" timestamps (timestamps that are not beyond the persist frontier, for sticklers) will not be written down because the output shards upper is already past that. All that will be written down is: ``` ((k1, 11, v2), -1) ((k1, 11, v3), 1) ``` Which is the correct updates to write down given the current contents of the shard at upper `11`. This _is_ subtle, and I'm note sure I like it, but I also think it's correct. As a follow-up, we could say that we don't process updates that are not beyond the persist frontier, because we know that they won't be written down anymore. But this is not needed to fix this bug.
The code was a bit clunky before, and obscuring what was happening. This is a pure refactor with no change in behavior.
This reverts commit e418f32.
This reverts commit 77fa69a.
This reverts commit 456680f.
This reverts commit e0cc81d.
This reverts commit b9e5280.
Rebased on top of #30977: https://buildkite.com/materialize/test/builds/96884
|
Got it once locally:
With this:
Still took ~1 hour.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.