Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(streaming): compact useless intermediate result in stream chunk #10949

Closed
2 tasks done
st1page opened this issue Jul 14, 2023 · 1 comment · Fixed by #11070 or #14652
Closed
2 tasks done

perf(streaming): compact useless intermediate result in stream chunk #10949

st1page opened this issue Jul 14, 2023 · 1 comment · Fixed by #11070 or #14652
Assignees
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented Jul 14, 2023

Background

Some streaming operators can produce useless intermediate results in a stream chunk. And because under out process model, we can compact ops with the same stream key in the same epoch. We can reduce them. Even for those slightly reducing could matters before some other operators with amplification such as join and HopWindow( 2x amplification does not matter but 100x amplification is far better than 200x). here are some examples.

  1. Project update some field
create table t(a int, b int, k int primiary key);
create materialize view mv as select a, k from t;
insert into t values (1, 1, 1), (2, 2, 2);
UPDATE SET b = 10 FROM T;

even the update will not change the value of the mv, the changes will still be sent to the down stream (IIRC it is found by @TennyZhuang )

  1. In discussion with some users, the first case can appear in some source or cdc connector(update a extra filed and give update with the same old value and new value)

  2. outer join
    if outer join does not do any optimization. It will amplify anyway. the outer's two op (U-, old),(U+,new) change to (U-, old), (U+,NULL), (U-, NULL), (U+,new)

feature

we should have common utils to compact ops with the same stream key in the same epoch, even only on the chunk based. It should only change the visibility and ops to prevent unnecessary over head

Tracking

@github-actions github-actions bot added this to the release-1.0 milestone Jul 14, 2023
@fuyufjh fuyufjh modified the milestones: release-1.0, release-1.1 Jul 14, 2023
@xx01cyx
Copy link
Contributor

xx01cyx commented Jul 14, 2023

#10853 is actually a case of 2 and can be resolved by this. cc @fuyufjh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment