Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sels int64 #17755

Merged
merged 5 commits into from
Aug 2, 2024
Merged

Make sels int64 #17755

merged 5 commits into from
Aug 2, 2024

Conversation

fengttt
Copy link
Contributor

@fengttt fengttt commented Jul 27, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #17754

What this PR does / why we need it:

Unfuck our code.


PR Type

Enhancement, Tests


Description

  • Updated various vector search functions to use int64 instead of int32.
  • Modified test cases to expect []int64 instead of []int32.
  • Refactored vector union functions to support both int64 and int32.
  • Updated shuffle handling functions to use int64.
  • Changed sels field in multiple structs and functions from []int32 to []int64.

Changes walkthrough 📝

Relevant files
Enhancement
15 files
search.go
Update vector search functions to use int64                           

pkg/container/vector/search.go

  • Changed return type of various search functions from []int32 to
    []int64.
  • Updated internal logic to use int64 instead of int32.
  • +94/-94 
    vector.go
    Refactor vector union functions to support int64 and int32

    pkg/container/vector/vector.go

  • Changed Union function to accept []int64 instead of []int32.
  • Added UnionInt32 function to handle []int32.
  • Introduced generic unionT function to handle both int32 and int64.
  • +8/-1     
    utils.go
    Modify ConstructRowidColumnToWithSels to use int64             

    pkg/objectio/utils.go

  • Updated ConstructRowidColumnToWithSels function to accept []int64
    instead of []int32.
  • +1/-1     
    join.go
    Update AntiJoin probe function to use int64                           

    pkg/sql/colexec/anti/join.go

    • Changed eligible slice from []int32 to []int64.
    +3/-3     
    external.go
    Modify external batch processing to use int64                       

    pkg/sql/colexec/external/external.go

    • Changed sels slice from []int32 to []int64.
    +3/-3     
    join.go
    Update SemiJoin probe function to use int64                           

    pkg/sql/colexec/semi/join.go

    • Changed eligible slice from []int32 to []int64.
    +2/-2     
    shuffle.go
    Refactor shuffle handling to use int64                                     

    pkg/sql/colexec/shuffle/shuffle.go

  • Changed sels slices from []int32 to []int64.
  • Updated shuffle handling functions to use int64.
  • +62/-62 
    types.go
    Update container struct to use int64 for sels                       

    pkg/sql/colexec/shuffle/types.go

  • Changed sels field in container struct from [][]int32 to [][]int64.
  • +1/-1     
    func_unary.go
    Modify unary function to use int64 for sels                           

    pkg/sql/plan/function/func_unary.go

    • Changed sels slice from []int32 to []int64.
    +2/-2     
    pk_filter.go.go
    Update PK filter search functions to use int64                     

    pkg/vm/engine/disttae/pk_filter.go.go

    • Changed search functions to return []int64 instead of []int32.
    +3/-3     
    txn_table_sharding_handle.go
    Remove redundant error handling in txn_table_sharding_handle

    pkg/vm/engine/disttae/txn_table_sharding_handle.go

    • Removed redundant error handling code.
    +1/-3     
    util.go
    Modify LinearSearchOffsetByValFactory to use int64             

    pkg/vm/engine/disttae/util.go

  • Changed LinearSearchOffsetByValFactory to return []int64 instead of
    []int32.
  • +31/-31 
    read.go
    Update blockio read functions to use int64                             

    pkg/vm/engine/tae/blockio/read.go

  • Changed ReadFilterSearchFuncType to return []int64 instead of []int32.
  • Updated various functions to use []int64 for sels.
  • +6/-6     
    vector.go
    Modify vector container to use int64 for sels                       

    pkg/vm/engine/tae/containers/vector.go

    • Changed sels slice from []int32 to []int64.
    +2/-2     
    base.go
    Update table base to use int64 for sels                                   

    pkg/vm/engine/tae/tables/base.go

    • Changed sels slice from []int32 to []int64.
    +3/-3     
    Tests
    1 files
    utils_test.go
    Modify test cases for int64 vector search functions           

    pkg/container/vector/utils_test.go

    • Updated test cases to expect []int64 instead of []int32.
    +3/-3     
    Miscellaneous
    1 files
    txn_table_sharding.go
    Minor formatting and comment addition                                       

    pkg/vm/engine/disttae/txn_table_sharding.go

    • Minor formatting changes and added a comment.
    +2/-0     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jul 27, 2024
    @mergify mergify bot added the kind/refactor Code refactor label Jul 27, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Possible Bug
    There is a commented line indicating a bug without further explanation or resolution. This needs investigation to ensure it does not affect functionality.

    Error Handling
    Error handling was removed in the function HandleShardingReadApproxObjectsNum, which might lead to unhandled errors that could affect stability.

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add nil checks for vectors to prevent runtime panics

    To avoid potential panics from index out of range or nil pointer dereference, add a
    check to ensure vec and pk are not nil before proceeding with operations in
    LinearSearchOffsetByValFactory.

    pkg/vm/engine/disttae/util.go [844]

     func LinearSearchOffsetByValFactory(pk *vector.Vector) func(*vector.Vector) []int64 {
    +    if pk == nil {
    +        return nil
    +    }
    +    return func(vec *vector.Vector) []int64 {
    +        if vec == nil {
    +            return nil
    +        }
    +        var sels []int64
     
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    Why: Adding nil checks for vectors is crucial to prevent potential runtime panics due to nil pointer dereferences, which enhances the robustness of the function.

    10
    Add a check for zero value of AliveRegCnt to prevent division by zero

    Consider checking if shuffle.AliveRegCnt is zero before initializing
    shuffle.ctr.sels to avoid potential division by zero in the capacity calculation for
    slices. This will prevent runtime panic in cases where AliveRegCnt might be zero.

    pkg/sql/colexec/shuffle/shuffle.go [161-163]

    -shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    -for i := 0; i < int(shuffle.AliveRegCnt); i++ {
    -    shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +if shuffle.AliveRegCnt > 0 {
    +    shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    +    for i := 0; i < int(shuffle.AliveRegCnt); i++ {
    +        shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +    }
    +} else {
    +    // Handle the case where AliveRegCnt is zero or set a default value
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential runtime panic by ensuring that AliveRegCnt is not zero before performing division, which is a critical bug fix.

    9
    Add nil check for groupByVec to prevent nil pointer dereference

    Consider handling the case where groupByVec is nil in the
    getShuffledSelsByHashWithNull function to prevent potential nil pointer dereference.

    pkg/sql/colexec/shuffle/shuffle.go [212]

     groupByVec := bat.Vecs[ap.ShuffleColIdx]
    +if groupByVec == nil {
    +    return nil // or handle error appropriately
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: This suggestion prevents a potential nil pointer dereference, which is a significant bug fix that enhances the robustness of the code.

    8
    Error handling
    Enhance error handling to prevent runtime panics

    Consider handling the case where MustFixedColT might return an error, to
    prevent potential runtime panics.

    pkg/container/vector/search.go [30]

    -rows := MustFixedCol[T](vector)
    +rows, err := FixedCol[T](vector)
    +if err != nil {
    +  return nil, err
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding error handling for MustFixedCol[T](vector) can prevent potential runtime panics, which is crucial for robust and reliable code. This is an important improvement for error handling.

    9
    Maintainability
    Handle unexpected types gracefully in switch cases

    To ensure that the sels slice captures all indices without missing any due to
    skipped conditions, add a default case in the switch statement to handle unexpected
    types, possibly logging an error or handling the unknown type appropriately.

    pkg/vm/engine/disttae/util.go [993]

     switch vec.GetType().Oid {
    +default:
    +    log.Printf("Unhandled type: %v", vec.GetType().Oid)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding a default case in the switch statement to handle unexpected types ensures that all cases are covered, improving the maintainability and robustness of the code.

    9
    Best practice
    Add error handling for the union operation to ensure robustness

    Add error handling after the Union function call to handle potential errors that
    might occur during the union operation.

    pkg/vm/engine/tae/containers/vector.go [442-443]

     err = vec.wrapped.Union(src, sels, vec.mpool)
    -return
    +if err != nil {
    +    return err
    +}
    +return nil
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding error handling after the Union function call is a best practice that enhances the robustness and reliability of the code. This is an important improvement.

    9
    Performance
    Optimize loop performance by caching the length of slices

    Use a more efficient loop condition by caching the length of rows outside the loop
    to avoid multiple length computations.

    pkg/container/vector/search.go [35]

    -for x := range rows {
    +n := len(rows)
    +for x := 0; x < n; x++ {
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Caching the length of the slice outside the loop avoids multiple length computations, which can slightly improve performance. This is a good practice for optimizing loop performance.

    8
    Improve performance by preallocating the sels slice

    Consider preallocating the sels slice to improve performance. Since the maximum
    index is known based on the length of vs, you can initialize sels with a capacity to
    avoid multiple reallocations during the append operations.

    pkg/vm/engine/disttae/util.go [992]

    -var sels []int64
    +sels := make([]int64, 0, len(vs))
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Preallocating the sels slice can improve performance by reducing the number of reallocations during append operations. This is a beneficial optimization given the context.

    8
    Improve memory allocation by preallocating slices

    Consider using a preallocated slice with a capacity hint for sels to improve memory
    allocation performance, especially if the expected number of elements is known or
    can be estimated.

    pkg/container/vector/search.go [29]

    -var sels []int64
    +sels := make([]int64, 0, estimatedSize)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Preallocating slices can improve memory allocation performance, especially if the expected number of elements is known or can be estimated. This is a minor optimization but can be beneficial in performance-critical code.

    7
    Reduce redundant method calls by storing results in a variable

    Instead of repeatedly calling GetType().Oid inside the loop, consider storing the
    result in a variable outside the loop to reduce redundant method calls, enhancing
    the function's efficiency.

    pkg/vm/engine/disttae/util.go [993]

    -switch vec.GetType().Oid {
    +vecType := vec.GetType().Oid
    +switch vecType {
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Storing the result of GetType().Oid in a variable outside the loop reduces redundant method calls, improving the function's efficiency. This is a minor but useful optimization.

    7
    Improve memory efficiency by preallocating slice capacity

    Consider preallocating the sels slice with a capacity if the size is known ahead of
    time to improve memory allocation efficiency.

    pkg/vm/engine/tae/containers/vector.go [438]

    -sels := make([]int64, srcLen)
    +sels := make([]int64, 0, srcLen)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Preallocating the slice capacity can improve memory allocation efficiency, which is a good practice for performance optimization. However, the improvement is minor.

    7
    Store repeated type conversions outside the loop to optimize performance

    Optimize the repeated calls to uint64(v) by storing the result in a variable outside
    the loop, reducing the number of conversions needed.

    pkg/sql/colexec/shuffle/shuffle.go [306]

     for row, v := range groupByCol {
    -    regIndex := plan2.SimpleInt64HashToRange(uint64(v), lenRegs)
    +    vUint64 := uint64(v)
    +    regIndex := plan2.SimpleInt64HashToRange(vUint64, lenRegs)
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: This suggestion optimizes performance by reducing the number of type conversions within a loop, which is a minor but useful improvement.

    6
    Reduce type conversion overhead by computing conversions once per loop iteration

    Instead of using int64(x) repeatedly inside the loop, compute it once per iteration
    to reduce type conversion overhead.

    pkg/container/vector/search.go [37]

    -sels = append(sels, int64(x))
    +ix := int64(x)
    +sels = append(sels, ix)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Computing the type conversion once per iteration reduces overhead, but the performance gain is likely minimal. This is a minor optimization.

    6
    Enhancement
    Modify getSels() to reset slices without changing their capacity

    To ensure that the getSels() function does not inadvertently modify the length of
    the slices in shuffle.ctr.sels, consider using a different approach to reset slices
    without changing their capacity.

    pkg/sql/colexec/shuffle/shuffle.go [170-171]

     for i := range shuffle.ctr.sels {
    -    shuffle.ctr.sels[i] = shuffle.ctr.sels[i][:0]
    +    shuffle.ctr.sels[i] = make([]int64, 0, cap(shuffle.ctr.sels[i]))
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: This enhancement improves the efficiency of resetting slices by preserving their capacity, which can be beneficial for performance and memory usage.

    7
    Simplify loop iteration by using range over slices

    Use a direct assignment in the loop to simplify the code and potentially enhance
    readability.

    pkg/vm/engine/tae/containers/vector.go [439-441]

    -for j := 0; j < srcLen; j++ {
    +for j := range sels {
         sels[j] = int64(j) + int64(srcOff)
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Using range in the loop can enhance readability and simplify the code. However, the improvement is relatively minor and does not significantly impact functionality.

    6
    Possible issue
    Check the necessity of type conversion to avoid potential integer overflow

    Ensure that the conversion from int to int64 is necessary, considering the potential
    for integer overflow if srcOff is large.

    pkg/vm/engine/tae/containers/vector.go [440]

    -sels[j] = int64(j) + int64(srcOff)
    +sels[j] = int64(j + srcOff)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 3

    Why: The suggestion to avoid type conversion is not necessarily better, as the original code explicitly converts both values to int64, which can prevent overflow issues. The suggested change might introduce subtle bugs.

    3

    @XuPeng-SH XuPeng-SH mentioned this pull request Jul 27, 2024
    7 tasks
    @fengttt fengttt marked this pull request as draft July 27, 2024 17:33
    @fengttt fengttt marked this pull request as ready for review July 27, 2024 23:56
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Performance Concern
    The conversion from int to int64 in the append operation inside loops may introduce unnecessary overhead. Consider optimizing the loop to handle int64 values directly if possible.

    Redundant Code
    The function OrderedSearchOffsetsByLess and similar functions have redundant checks and conversions. Simplify the logic to improve readability and performance.

    Type Consistency
    The variable eligible is changed from []int32 to []int64. Ensure that all related operations and dependent code correctly handle the new type without loss of information or overflow.

    Commented Code
    There is a commented line of code with no explanation. If this line is not needed, it should be removed; otherwise, clarify its purpose with a comment.

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add a check for zero before creating slices based on shuffle.AliveRegCnt

    Consider checking if shuffle.AliveRegCnt is zero before making slices to avoid
    potential division by zero or creating empty slices unnecessarily.

    pkg/sql/colexec/shuffle/shuffle.go [161-163]

    -shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    -shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +if shuffle.AliveRegCnt > 0 {
    +    shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    +    for i := 0; i < int(shuffle.AliveRegCnt); i++ {
    +        shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +    }
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential bug by preventing division by zero and unnecessary slice creation, which is crucial for avoiding runtime errors.

    9
    Add validation for parameters used in memory allocation to prevent runtime errors

    Validate the AliveRegCnt and other parameters used in slice allocations to prevent
    runtime panics due to invalid memory allocation requests.

    pkg/sql/colexec/shuffle/shuffle.go [161-163]

    -shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    -shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +if shuffle.AliveRegCnt > 0 && colexec.DefaultBatchSize > shuffle.AliveRegCnt {
    +    shuffle.ctr.sels = make([][]int64, shuffle.AliveRegCnt)
    +    for i := 0; i < int(shuffle.AliveRegCnt); i++ {
    +        shuffle.ctr.sels[i] = make([]int64, 0, colexec.DefaultBatchSize/shuffle.AliveRegCnt*2)
    +    }
    +} else {
    +    // Handle error or invalid parameter scenario
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion adds necessary validation to prevent runtime panics due to invalid memory allocation requests, addressing a critical issue.

    9
    Concurrency
    Add thread safety to modifications of shuffle.ctr.sels

    To ensure thread safety, consider using synchronization mechanisms like mutexes when
    modifying shuffle.ctr.sels if Shuffle methods might be called concurrently.

    pkg/sql/colexec/shuffle/shuffle.go [171]

    +shuffle.mux.Lock()
     shuffle.ctr.sels[i] = shuffle.ctr.sels[i][:0]
    +shuffle.mux.Unlock()
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Ensuring thread safety is important for concurrent execution. This suggestion correctly adds mutex locks to prevent race conditions.

    8
    Performance
    Improve memory usage by preallocating slices based on expected size

    Use a preallocated slice for sels to improve memory allocation performance,
    especially when the expected number of elements is known or can be estimated.

    pkg/container/vector/search.go [29-32]

    -var sels []int64
     rows := MustFixedCol[T](vector)
    +sels := make([]int64, 0, len(rows))
     if len(rows) == 0 {
         return sels
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Preallocating the slice can significantly improve memory allocation performance, especially for large datasets, making this a valuable optimization.

    8
    Improve performance by pre-allocating slices

    Consider pre-allocating the sels slice to improve performance by reducing the number
    of memory allocations during the append operations. This is particularly beneficial
    when the expected number of elements is large or can be estimated from the input
    vector's length.

    pkg/vm/engine/disttae/util.go [992]

    -var sels []int64
    +sels := make([]int64, 0, len(vec))
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Pre-allocating the sels slice can significantly improve performance by reducing the number of memory allocations during the append operations, especially when the size of the input vector can be estimated.

    8
    Improve memory allocation by preallocating the eligible slice

    Consider preallocating the eligible slice to improve memory allocation performance.
    Since you know the maximum possible size it can grow to (count), initializing it
    with this capacity can prevent multiple reallocations as the slice grows.

    pkg/sql/colexec/semi/join.go [170]

    -eligible := make([]int64, 0)
    +eligible := make([]int64, 0, count)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Preallocating the eligible slice with a capacity of count can improve memory allocation performance by preventing multiple reallocations as the slice grows. This is a valid and beneficial optimization.

    8
    Optimize type conversion by reducing its redundancy within loops

    Replace the repeated int64(x) type conversion with a single conversion at the start
    of the loop to enhance performance and readability.

    pkg/container/vector/search.go [35-42]

     for x := range rows {
    +    ix := int64(x)
         if closed && rows[x] <= ub {
    -        sels = append(sels, int64(x))
    +        sels = append(sels, ix)
         } else if !closed && rows[x] < ub {
    -        sels = append(sels, int64(x))
    +        sels = append(sels, ix)
         } else if quick {
             break
         }
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion reduces redundancy and enhances performance by converting the type once at the start of the loop. However, the performance gain might be minimal in most cases.

    7
    Reduce type conversion overhead inside loops

    Instead of repeatedly calling int64(i) inside the loop, convert i to int64 once
    before the loop starts to reduce the overhead of type conversion in each iteration.

    pkg/vm/engine/disttae/util.go [996-998]

     for i, v := range vs {
    +    idx := int64(i)
         if mp[v] {
    -        sels = append(sels, int64(i))
    +        sels = append(sels, idx)
         }
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Converting i to int64 once before the loop starts can reduce the overhead of type conversion in each iteration, improving performance slightly.

    7
    Use copy to reset slices to zero for potential efficiency gains

    Instead of manually setting each index to zero, consider using a built-in function
    like copy to reset all elements in shuffle.ctr.sels to zero, which can be more
    efficient.

    pkg/sql/colexec/shuffle/shuffle.go [171]

    -shuffle.ctr.sels[i] = shuffle.ctr.sels[i][:0]
    +for i := range shuffle.ctr.sels {
    +    copy(shuffle.ctr.sels[i], make([]int64, len(shuffle.ctr.sels[i])))
    +}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: This suggestion improves performance by using a more efficient method to reset slices, though it is a minor optimization.

    6
    Maintainability
    Simplify the code by removing unnecessary type conversion

    Ensure that the conversion to int64 in the append operation is necessary. If i+k is
    already an int64, you can omit the explicit conversion to simplify the code.

    pkg/sql/colexec/semi/join.go [235]

    -eligible = append(eligible, int64(i+k))
    +eligible = append(eligible, i+k)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion to remove the explicit conversion to int64 is valid if i+k is already an int64. This can simplify the code and improve readability. However, it is essential to ensure that i+k is indeed an int64 before making this change.

    7
    Enhance code clarity and maintainability by using a switch statement

    Refactor the loop to use a switch statement for clarity and maintainability when
    checking conditions for appending to sels.

    pkg/container/vector/search.go [35-42]

     for x := range rows {
    -    if closed && rows[x] <= ub {
    +    switch {
    +    case quick:
    +        break
    +    case closed && rows[x] <= ub:
             sels = append(sels, int64(x))
    -    } else if !closed && rows[x] < ub {
    +    case !closed && rows[x] < ub:
             sels = append(sels, int64(x))
    -    } else if quick {
    -        break
         }
     }
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Using a switch statement can improve code readability and maintainability, but it does not provide a significant functional improvement over the existing if-else structure.

    6
    Refactor repeated code into a function to improve maintainability

    Refactor the repeated code pattern for appending to sels into a separate function to
    improve code maintainability and reduce code duplication.

    pkg/vm/engine/disttae/util.go [998]

    -if mp[v] {
    -    sels = append(sels, int64(i))
    +func appendIfPresent(mp map[any]bool, v any, i int, sels []int64) []int64 {
    +    if mp[v] {
    +        return append(sels, int64(i))
    +    }
    +    return sels
     }
    +// Usage:
    +sels = appendIfPresent(mp, v, i, sels)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Refactoring the repeated code pattern for appending to sels into a separate function can improve code maintainability and reduce code duplication, although it adds a bit of complexity.

    6

    Copy link
    Contributor

    @badboynt1 badboynt1 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    i think sels in shuffle.go should not change to int64, which will cause more memory allocation

    @fengttt fengttt requested a review from badboynt1 July 31, 2024 22:52
    @fengttt
    Copy link
    Contributor Author

    fengttt commented Jul 31, 2024

    i think sels in shuffle.go should not change to int64, which will cause more memory allocation

    this sels size should not depends on the amount of data pass through. for shuffle, it is not a significant overhead. This is a non issue.

    @fengttt fengttt merged commit 292468e into matrixorigin:main Aug 2, 2024
    12 of 16 checks passed
    @fengttt fengttt deleted the fengttt-sels branch September 6, 2024 06:03
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Enhancement kind/refactor Code refactor Review effort [1-5]: 4 size/M Denotes a PR that changes [100,499] lines Tests
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    9 participants