Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Try to reproduce 8798 #30811

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from
Draft

DNM: Try to reproduce 8798 #30811

wants to merge 37 commits into from

Conversation

def-
Copy link
Contributor

@def- def- commented Dec 12, 2024

Got it once locally:

services.log:parallel-workload-materialized-1     | 2024-12-12T20:10:07.450881459Z thread 'timely:work-1' panicked at src/storage/src/upsert/types.rs:751:30:

With this:

while true; do bin/mzcompose --find parallel-workload down && bin/mzcompose --find parallel-workload run default --runtime=1500 --scenario=0dt-deploy --threads=16 --seed=1732754241 ; bin/mzcompose --find parallel-workload logs >> services.log; mv services.log services-$(date +%s).log; done

Still took ~1 hour.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@benesch
Copy link
Member

benesch commented Dec 25, 2024

Reopening so I can iterate on this via CI. So far I've rebased and pushed up a change to run the failing 0dt test 20x during the test PR build (and no other tests).

@benesch benesch force-pushed the pr-pw-repro branch 3 times, most recently from 5254bda to 3f847fc Compare December 31, 2024 04:18
aljoscha and others added 21 commits January 8, 2025 21:17
This is not a fix for a bug that we observed, but I think it's still
incorrect to check using these combined `(ts, subtime)` timestamps.

I believe there is a general problem with the subtime approach, in a way
the outside code is "lying" to the operators in side the subtime scope.
A pattern in timely operators is to stash updates of a given timestamp
`t` until we know that we have seen all updates for that timestamp, at
which time we process them all together. The way operators usually check
that timestamp `t` is ready for processing is something like
`!input_frontier.less_equal(t)`.

Within the Subtime scope this is broken. For example, say we have
updates at timestamp `(5, Subtime(0))`. Now the input frontier advances
to `(5, Subtime(1))`. The operator will now naively assume that updates
at time `(5, Subtime(0))` are ready for processing, but in fact we
haven't yet seen all the updates of the "full" outer timestamp `5`.

It feels incorrect that the operator has to know about this special
timestamp and have special "unwrapping" code that needs to be applied
before checking for readiness.
Before, while working off the "upsert commands" (aka. input) we were
taking the existing value out of `command_state` when processing and we
were only putting in a new value when processing commands with
`DrainStyle::AtTime`. This is problematic when processing commands of
multiple timestamps in one invocation of `drain_staged_input`:
processing the first update for a key `k` at timestamp `t` takes away
any value that we might have retrieved from state, then when processing
another update for key `k` at timestamp `t+1`, we think there is no
previous value and don't emit a retraction. Meaning we suddely have
multiple values for the same key in our collection, which is not legal.

Now, we just don't take away the state value without putting something
back.

The astute reader might now wonder how this can be correct. Say our
upsert state contains:

```
k1 -> v2
```

The input frontier is at at a time that allows processing and the
persist input frontier is at `[11]`.

We're processing these upsert commands, tuples of `(key, timestamp,
value)`:

```
(k1, 10, v2)
(k1, 11, v3)
```

Naively (and somewhat correctly), you would assume the correct updates
to emit when processing this, one timestamp after another, would be (now
additionally with diffs):

```
((k1, 10, v2), -1)
((k1, 10, v2), 1)
((k1, 11, v2), -1)
((k1, 11, v3), 1)
```
(Those first two updates might seem nonsensical, but the code doesn't
actually check to see if the values differ, it simply retracts and
updates.)

The answer lies in the fact that we're _only_ allowed to process updates
with timestamp `11` when they are not beyond the persist frontier
anymore, that is the frontier is at say `11`. Meaning what we have in
state is the _global view_ of upsert state as of that persist frontier,
meaning it is correct to emit updates at `11`. And anything we emit at
"lower" timestamps (timestamps that are not beyond the persist frontier,
for sticklers) will not be written down because the output shards upper
is already past that. All that will be written down is:

```
((k1, 11, v2), -1)
((k1, 11, v3), 1)
```

Which is the correct updates to write down given the current contents of
the shard at upper `11`. This _is_ subtle, and I'm note sure I like it,
but I also think it's correct.

As a follow-up, we could say that we don't process updates that are not
beyond the persist frontier, because we know that they won't be written
down anymore. But this is not needed to fix this bug.
The code was a bit clunky before, and obscuring what was happening. This
is a pure refactor with no change in behavior.
This reverts commit e418f32.
This reverts commit 77fa69a.
This reverts commit 456680f.
This reverts commit b9e5280.
@def-
Copy link
Contributor Author

def- commented Jan 9, 2025

Rebased on top of #30977: https://buildkite.com/materialize/test/builds/96884
Edit: Seems good, at least the panic is not occurring, startup is taking too long for 0dt, but that is probably just because we have too many sources:

Timed out waiting for materialized2 to reach Mz deployment status ReadyToPromote, still in status Initializing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants