DNM: Try to reproduce 8798 #30811

def- · 2024-12-12T20:17:34Z

Got it once locally:

services.log:parallel-workload-materialized-1     | 2024-12-12T20:10:07.450881459Z thread 'timely:work-1' panicked at src/storage/src/upsert/types.rs:751:30:

With this:

while true; do bin/mzcompose --find parallel-workload down && bin/mzcompose --find parallel-workload run default --runtime=1500 --scenario=0dt-deploy --threads=16 --seed=1732754241 ; bin/mzcompose --find parallel-workload logs >> services.log; mv services.log services-$(date +%s).log; done

Still took ~1 hour.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

benesch · 2024-12-25T21:05:53Z

Reopening so I can iterate on this via CI. So far I've rebased and pushed up a change to run the failing 0dt test 20x during the test PR build (and no other tests).

This is not a fix for a bug that we observed, but I think it's still incorrect to check using these combined `(ts, subtime)` timestamps. I believe there is a general problem with the subtime approach, in a way the outside code is "lying" to the operators in side the subtime scope. A pattern in timely operators is to stash updates of a given timestamp `t` until we know that we have seen all updates for that timestamp, at which time we process them all together. The way operators usually check that timestamp `t` is ready for processing is something like `!input_frontier.less_equal(t)`. Within the Subtime scope this is broken. For example, say we have updates at timestamp `(5, Subtime(0))`. Now the input frontier advances to `(5, Subtime(1))`. The operator will now naively assume that updates at time `(5, Subtime(0))` are ready for processing, but in fact we haven't yet seen all the updates of the "full" outer timestamp `5`. It feels incorrect that the operator has to know about this special timestamp and have special "unwrapping" code that needs to be applied before checking for readiness.

Before, while working off the "upsert commands" (aka. input) we were taking the existing value out of `command_state` when processing and we were only putting in a new value when processing commands with `DrainStyle::AtTime`. This is problematic when processing commands of multiple timestamps in one invocation of `drain_staged_input`: processing the first update for a key `k` at timestamp `t` takes away any value that we might have retrieved from state, then when processing another update for key `k` at timestamp `t+1`, we think there is no previous value and don't emit a retraction. Meaning we suddely have multiple values for the same key in our collection, which is not legal. Now, we just don't take away the state value without putting something back. The astute reader might now wonder how this can be correct. Say our upsert state contains: ``` k1 -> v2 ``` The input frontier is at at a time that allows processing and the persist input frontier is at `[11]`. We're processing these upsert commands, tuples of `(key, timestamp, value)`: ``` (k1, 10, v2) (k1, 11, v3) ``` Naively (and somewhat correctly), you would assume the correct updates to emit when processing this, one timestamp after another, would be (now additionally with diffs): ``` ((k1, 10, v2), -1) ((k1, 10, v2), 1) ((k1, 11, v2), -1) ((k1, 11, v3), 1) ``` (Those first two updates might seem nonsensical, but the code doesn't actually check to see if the values differ, it simply retracts and updates.) The answer lies in the fact that we're _only_ allowed to process updates with timestamp `11` when they are not beyond the persist frontier anymore, that is the frontier is at say `11`. Meaning what we have in state is the _global view_ of upsert state as of that persist frontier, meaning it is correct to emit updates at `11`. And anything we emit at "lower" timestamps (timestamps that are not beyond the persist frontier, for sticklers) will not be written down because the output shards upper is already past that. All that will be written down is: ``` ((k1, 11, v2), -1) ((k1, 11, v3), 1) ``` Which is the correct updates to write down given the current contents of the shard at upper `11`. This _is_ subtle, and I'm note sure I like it, but I also think it's correct. As a follow-up, we could say that we don't process updates that are not beyond the persist frontier, because we know that they won't be written down anymore. But this is not needed to fix this bug.

The code was a bit clunky before, and obscuring what was happening. This is a pure refactor with no change in behavior.

This reverts commit e418f32.

This reverts commit 77fa69a.

This reverts commit 456680f.

This reverts commit e0cc81d.

This reverts commit b9e5280.

This reverts commit b91a339.

The other workload is much larger and plausibly much better at reproducing the issue

This reverts commit af9a041.

def- · 2025-01-09T06:30:09Z

Rebased on top of #30977: https://buildkite.com/materialize/test/builds/96884
Edit: Seems good, at least the panic is not occurring, startup is taking too long for 0dt, but that is probably just because we have too many sources:

Timed out waiting for materialized2 to reach Mz deployment status ReadyToPromote, still in status Initializing

def- force-pushed the pr-pw-repro branch from a44a0b6 to 733f7f3 Compare December 12, 2024 20:26

def- closed this Dec 12, 2024

def- mentioned this pull request Dec 13, 2024

Reproduce 0dt upsert source panic in more deterministic workflow #30820

Merged

5 tasks

benesch reopened this Dec 25, 2024

benesch force-pushed the pr-pw-repro branch from 733f7f3 to 4cace53 Compare December 25, 2024 21:05

benesch force-pushed the pr-pw-repro branch 3 times, most recently from 5254bda to 3f847fc Compare December 31, 2024 04:18

aljoscha and others added 21 commits January 8, 2025 21:17

storage: in upsert, log hydrating flag

57662bb

storage: in feedback upsert, simplify handling of existing state values

5215d48

The code was a bit clunky before, and obscuring what was happening. This is a pure refactor with no change in behavior.

DNM: Try to reproduce 8798

48a1f9c

twiddle weights

66c0fe5

Revert "twiddle weights"

23215cf

This reverts commit e418f32.

Disable only reads, fetches, and DMLs

a972270

Disable other stuff that's likely unrelated

c9876ca

Minimal?

ec9dc4e

Revert "Minimal?"

9c80d10

This reverts commit 77fa69a.

Try a different minimization

8bc5d70

fix bug

a8fd7a0

no drops

b781c50

Revert "no drops"

8f24dc0

This reverts commit 456680f.

try another minimization

8eb413b

no drop cluster

35b5aa0

no drop database

193513e

no drop schema

59b9ac4

Revert "no drop database"

259a690

This reverts commit e0cc81d.

Revert "no drop schema"

b2ca362

This reverts commit b9e5280.

benesch added 16 commits January 9, 2025 06:29

try a totally different repro strategy

d2a8512

lolwhoops

7d2211d

fix initialization

f9d94b9

run no workload at all?

4db9ad9

moar kafka sources

f8b017b

Revert "run no workload at all?"

ea29312

This reverts commit b91a339.

announce source data generators

ed630f8

remove single sensor updating workload

e65b65d

The other workload is much larger and plausibly much better at reproducing the issue

Revert "remove single sensor updating workload"

103957e

This reverts commit af9a041.

try only the single sensor updating workload

1967e99

actually generate overlapping updates!!!

7a98e51

fix the seed

eff0ef5

moar kafka sources

b09390b

limit to just one cluster for simplicity

01c4b4f

further repro juice

6524c8e

turn down 0dt timeout

cbf7021

def- force-pushed the pr-pw-repro branch from 5453bb9 to cbf7021 Compare January 9, 2025 06:29

def- mentioned this pull request Jan 9, 2025

storage: in feedback upsert, fix handling of existing state values #30977

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNM: Try to reproduce 8798 #30811

DNM: Try to reproduce 8798 #30811

def- commented Dec 12, 2024 •

edited

Loading

benesch commented Dec 25, 2024

def- commented Jan 9, 2025 •

edited

Loading

DNM: Try to reproduce 8798 #30811

Are you sure you want to change the base?

DNM: Try to reproduce 8798 #30811

Conversation

def- commented Dec 12, 2024 • edited Loading

Checklist

benesch commented Dec 25, 2024

def- commented Jan 9, 2025 • edited Loading

def- commented Dec 12, 2024 •

edited

Loading

def- commented Jan 9, 2025 •

edited

Loading