You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some streaming operators can produce useless intermediate results in a stream chunk. And because under out process model, we can compact ops with the same stream key in the same epoch. We can reduce them. Even for those slightly reducing could matters before some other operators with amplification such as join and HopWindow( 2x amplification does not matter but 100x amplification is far better than 200x). here are some examples.
Project update some field
createtablet(a int, b int, k int primiary key);
create materialize view mv asselect a, k from t;
insert into t values (1, 1, 1), (2, 2, 2);
UPDATESET b =10FROM T;
even the update will not change the value of the mv, the changes will still be sent to the down stream (IIRC it is found by @TennyZhuang )
In discussion with some users, the first case can appear in some source or cdc connector(update a extra filed and give update with the same old value and new value)
outer join
if outer join does not do any optimization. It will amplify anyway. the outer's two op (U-, old),(U+,new) change to (U-, old), (U+,NULL), (U-, NULL), (U+,new)
feature
we should have common utils to compact ops with the same stream key in the same epoch, even only on the chunk based. It should only change the visibility and ops to prevent unnecessary over head
Background
Some streaming operators can produce useless intermediate results in a stream chunk. And because under out process model, we can compact ops with the same stream key in the same epoch. We can reduce them. Even for those slightly reducing could matters before some other operators with amplification such as join and HopWindow( 2x amplification does not matter but 100x amplification is far better than 200x). here are some examples.
even the update will not change the value of the mv, the changes will still be sent to the down stream (IIRC it is found by @TennyZhuang )
In discussion with some users, the first case can appear in some source or cdc connector(update a extra filed and give update with the same old value and new value)
outer join
if outer join does not do any optimization. It will amplify anyway. the outer's two op (U-, old),(U+,new) change to (U-, old), (U+,NULL), (U-, NULL), (U+,new)
feature
we should have common utils to compact ops with the same stream key in the same epoch, even only on the chunk based. It should only change the visibility and ops to prevent unnecessary over head
Tracking
The text was updated successfully, but these errors were encountered: