CARTESIAN (left outer) (anti) semi join may cause OOM even if right table is not very large #6730
Labels
affects-5.0
affects-5.1
affects-5.2
affects-5.3
affects-5.4
This bug affects the 5.4.x(LTS) versions.
affects-6.0
affects-6.1
This bug affects the 6.1.x(LTS) versions.
affects-6.2
affects-6.3
affects-6.4
affects-6.5
This bug affects the 6.5.x(LTS) versions.
component/compute
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
For CARTESIAN (left outer) (anti) semi join, each row in the left table's block must be combined with all of the rows in the right table's block in order to execute
other_conditions
expression or/andother_eq_conditions_from_in
expression.After that, according to the result of expressions, each row in the left table's block will only produce one or zero number of rows to the result block through
filter
function as below.tiflash/dbms/src/Interpreters/Join.cpp
Line 1460 in 4206a52
tiflash/dbms/src/Interpreters/Join.cpp
Line 1545 in 4206a52
Then the issue comes. The allocated memory size of these columns may be much greater than the used memory size of them because
filter
will use the size of source column to reserve.tiflash/dbms/src/Columns/IColumn.h
Lines 215 to 222 in 4206a52
For example, suppose the number of rows in the right table is 50k,
left_rows_per_iter
will be only 1 whenmax_block_size_for_cross_join
is 64k.tiflash/dbms/src/Interpreters/Join.cpp
Lines 1890 to 1898 in 4206a52
So in each iteration, there should be only one or zero number of rows in the result block while the allocated memory size is approximately equal to the memory of 50k rows.
If there are 64k rows in this block of the left table, there are 64k result blocks and the allocated memory size is approximately equal to the memory of 50k * 64k = 3276800k rows, which is likely to make TiFlash OOM.
The text was updated successfully, but these errors were encountered: