Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tech Request]: use stats of external scan to set auto incr cache #15933

Closed
2 tasks done
ouyuanning opened this issue May 9, 2024 · 2 comments
Closed
2 tasks done

[Tech Request]: use stats of external scan to set auto incr cache #15933

ouyuanning opened this issue May 9, 2024 · 2 comments
Assignees
Labels
kind/tech-request New feature or request priority/p0 Critical feature that should be implemented in this version
Milestone

Comments

@ouyuanning
Copy link
Contributor

Is there an existing issue for the same tech request?

  • I have checked the existing issues.

Does this tech request not affect user experience?

  • This tech request doesn't affect user experience.

What would you like to be added ?

use stats of external scan to set auto incr cache to improve performance

Why is this needed ?

use stats of external scan to set auto incr cache to improve performance

Additional information

No response

@ouyuanning ouyuanning added kind/tech-request New feature or request priority/p0 Critical feature that should be implemented in this version labels May 9, 2024
@ouyuanning ouyuanning added this to the 1.2.1 milestone May 9, 2024
@jensenojs
Copy link
Contributor

利用统计信息来调整申请auto incr cache的实现方式, 我理解需要调整一下接口.

首先梳理一下目前从preinsert -> incrservice申请缓存的调用链路

// pkg/sql/colexec/preinsert/preinsert.go L119
func genAutoIncrCol(bat *batch.Batch, proc *proc, arg *Argument) error {
        // 调用incrservice的服务, 这里只需要关注第三个参数, `bat`
	lastInsertValue, err := proc.IncrService.InsertValues(
		proc.Ctx,
		arg.TableDef.TblId,
		bat)
        ...
}

// pkg/incrservice/column_cache.go L503
// 这里只需要关注rows参数, rows参数的来源是`bat.RowCount()`, 在`tableCache.InsertAutoValues`中被设置, 此处省略
func insertAutoValues[T constraints.Integer](
        ...
	rows int,
        ...) {

        ...
	col.preAllocate(ctx, tableID, rows, txnOp)
	err := col.applyAutoValues(
		ctx,
		tableID,
		rows,
        ...
}

// pkg/incrservice/column_cache.go L342
// 这里的count就是上面传递下来的rows, 它会检测目前剩余的缓存容量, 如果大于要申请的行数, 那么就忽略这次请求.
// 否则就申请, 如果申请的`rows`小于`col.cfg.CountPerAllocate`的话, 会被调整为`col.cfg.CountPerAllocate`
func (col *columnCache) preAllocate(
        ...
	count int,
	...) {
	col.Lock()
	defer col.Unlock()

	if col.ranges.left() >= count {
		return
	}
        ...
}

快速过了一遍调用链路之后, 可以看出目前的逻辑是以bat.RowCount为单位去申请incrservice的缓存, 如果要实现本优化, 需要在compile阶段拿到ExternalScan的统计信息OutCnt, 然后以这个统计信息为, 取代bat.RowCnt进行申请.

此外, 需要考虑 preinsert本身可能会并行, 假设有一千万行数据, 十个preinsert算子, 那每个preinsert算子可以考虑申请一百万行的缓存容量.

@jensenojs
Copy link
Contributor

经讨论, 在这个分支上, 往preinsert.Arugment中塞入了子节点的统计信息, 后面请旭哥通过给incrservice添加新的接口来实现这个需求

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tech-request New feature or request priority/p0 Critical feature that should be implemented in this version
Projects
None yet
Development

No branches or pull requests

2 participants