Bytewax materialization can run infinitely #3788

james-crabtree-sp · 2023-10-09T20:39:43Z

Expected Behavior

Bytewax materialization should run all pods once successfully and then set job status as success

Current Behavior

In the event that a node crashes, successful pod records can be lost and the job will rerun all of those lost pods. If these node crashes occur often enough, this can result in a job continuously rerunning successful pods and never completing.

Steps to reproduce

Run a materialization job against a multi-node kubernetes cluster. Terminate one of the nodes, observe that pods are lost and rerun

Specifications

Version: 0.31
Platform: fedora linux
Subsystem: bytewax batch_engine

Possible Solution

For safety, the job should have a configurable activeDeadlineSeconds. The larger job should also be able to be split into smaller batches to mitigate the effect a node crash can have

james-crabtree-sp added kind/bug priority/p2 labels Oct 9, 2023

james-crabtree-sp mentioned this issue Oct 9, 2023

fix: Redundant feature materialization and premature incremental materialization timestamp updates #3789

Merged

achals closed this as completed in #3789 Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytewax materialization can run infinitely #3788

Bytewax materialization can run infinitely #3788

james-crabtree-sp commented Oct 9, 2023

Bytewax materialization can run infinitely #3788

Bytewax materialization can run infinitely #3788

Comments

james-crabtree-sp commented Oct 9, 2023

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution