Avoid redundant calculation of hash data during join probe stage #8294

windtalker · 2023-11-02T00:52:25Z

Enhancement

In join probe stage, for each input block, it will be wrapped inside ProbeProcessInfo, this is because if the build side has many duplicated key entries, the probe stage may expand the input block greatly, and inorder to keep the max-row-size of a output block, ProbeProcessInfo is used to support probe only part of the input block. That is to say for an input block, it may be probed multiple times, each time of part of the data in the block is processed.

For each probe, it will call probeBlockImplTypeCase to do the probe, and inside that function, it will calculate the hash data of the whole block:

tiflash/dbms/src/Interpreters/JoinPartition.cpp

Lines 1480 to 1494 in e597b78

    
           if (join_build_info.needVirtualDispatchForProbeBlock()) 
        
           { 
        
               assert(!(join_build_info.restore_round > 0 && join_build_info.enable_fine_grained_shuffle)); 
        
               /// TODO: consider adding a virtual column in Sender side to avoid computing cost and potential inconsistency by heterogeneous envs(AMD64, ARM64) 
        
               /// Note: 1. Not sure, if inconsistency will do happen in heterogeneous envs 
        
               ///       2. Virtual column would take up a little more network bandwidth, might lead to poor performance if network was bottleneck 
        
               /// Currently, the computation cost is tolerable, since it's a very simple crc32 hash algorithm, and heterogeneous envs support is not considered 
        
               computeDispatchHash( 
        
                   rows, 
        
                   key_columns, 
        
                   collators, 
        
                   sort_key_containers, 
        
                   join_build_info.restore_round, 
        
                   build_hash); 
        
           }

If a block is probed multiple times, the hash data will be calculated multiple times, which is meaningless and redundant.

The text was updated successfully, but these errors were encountered:

…terResult` (#8297) close #8294

…terResult` (pingcap#8297) close pingcap#8294

windtalker added the type/enhancement The issue or PR belongs to an enhancement. label Nov 2, 2023

windtalker mentioned this issue Nov 2, 2023

Reduce redundant calculation in join probe, optimize mergeNullAndFilterResult #8297

Merged

12 tasks

ti-chi-bot bot closed this as completed in #8297 Nov 2, 2023

ti-chi-bot bot pushed a commit that referenced this issue Nov 2, 2023

Reduce redundant calculation in join probe, optimize `mergeNullAndFil…

8685769

…terResult` (#8297) close #8294

windtalker added a commit to windtalker/tiflash that referenced this issue Nov 23, 2023

Reduce redundant calculation in join probe, optimize `mergeNullAndFil…

18b9994

…terResult` (pingcap#8297) close pingcap#8294

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid redundant calculation of hash data during join probe stage #8294

Avoid redundant calculation of hash data during join probe stage #8294

windtalker commented Nov 2, 2023

Avoid redundant calculation of hash data during join probe stage #8294

Avoid redundant calculation of hash data during join probe stage #8294

Comments

windtalker commented Nov 2, 2023

Enhancement