Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DN crash by panic: type mismatch: []types.Varlena INT during statbility test on distributed mode #17937

Closed
1 task done
aressu1985 opened this issue Aug 7, 2024 · 14 comments
Assignees
Labels
Milestone

Comments

@aressu1985
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

c1c365c

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
3*PROXY 3C 7G
- OS type:
- Others:

Actual Behavior

During statbility test on distributed mode, dn crashed by panic:
{"level":"INFO","time":"2024/08/06 17:45:22.239064 +0000","name":"tn-service.logtail-server","caller":"service/session.go:187","msg":"send response by segment","service":"00000000-0000-0000-0000-100000000000","uuid":"00000000-0000-0000-0000-100000000000","server-id":"019128b5-842f-7ebf-9d3f-9f3d0f50c765","chunk-number":3,"chunk-limit":16351,"message-size":36909}
panic: type mismatch: []types.Varlena INT

goroutine 1485 [running]:
github.com/matrixorigin/matrixone/pkg/container/vector.ToSlice[...](0xc09dcac780, 0xc00d1682b8)
/go/src/github.com/matrixorigin/matrixone/pkg/container/vector/vector.go:88 +0x305
github.com/matrixorigin/matrixone/pkg/container/vector.ToFixedCol[...](0xc09dcac780, 0xc00d1682b8)
/go/src/github.com/matrixorigin/matrixone/pkg/container/vector/tools.go:33 +0x86
github.com/matrixorigin/matrixone/pkg/container/vector.MustFixedCol...
/go/src/github.com/matrixorigin/matrixone/pkg/container/vector/tools.go:42 +0x45
github.com/matrixorigin/matrixone/pkg/container/vector.GetUnionAllFunction.func23(0xc011266600, 0xc09dcac780)
/go/src/github.com/matrixorigin/matrixone/pkg/container/vector/vector.go:1597 +0xeb
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers.(*vectorWrapper).extendWithOffset(0xc03796d760, 0xc09dcac780, 0x0, 0x1)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers/vector.go:434 +0x159
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers.(*vectorWrapper).ExtendWithOffset(0xc03796d760, {0x74e1878, 0xc03796d660}, 0x0, 0x1)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers/vector.go:413 +0x5e
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*anode).Append(0xc04cfff800, 0xc067ca1680, 0x0)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/anode.go:123 +0x458
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*tableSpace).Append(0xc053bd5400, 0xc067ca1680)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/table_space.go:294 +0x107
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*txnTable).Append(0xc0cdf70360, {0x74435b8, 0xc081699860}, 0xc067ca1680)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/table.go:611 +0x43c
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*txnDB).Append(0xc0fee50c00, {0x74435b8, 0xc081699860}, 0x42916, 0xc067ca1680)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/txndb.go:114 +0x14a
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*txnStore).Append(0xc0d99131e0, {0x74435b8, 0xc081699860}, 0x42913, 0x42916, 0xc067ca1680)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/store.go:297 +0x10b
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(*txnRelation).Append(0xc062e6e4e0, {0x74435b8, 0xc081699860}, 0xc067ca1680)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl/relation.go:160 +0x172
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.AppendDataToTable({0x74435b8, 0xc081699860}, {0x74d8d40, 0xc062e6e4e0}, 0xc056fc6f50)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc/adaptors.go:83 +0xf8
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(*Handle).HandleWrite(0xc0e7b73700, {0x74435b8, 0xc081699860}, {0x74f1820, 0xc062e6e330}, 0xc056e225b0)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc/handle.go:732 +0x1d6a
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(*Handle).handleRequests(0xc0e7b73700, {0x74435b8, 0xc081699860}, {0x74f1820, 0xc062e6e330}, 0xc05460aa80)
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc/handle.go:214 +0x485
github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(*Handle).HandleCommit(, {, }, {{0xc05c926060, 0x10, 0x10}, 0x0, {0x17e934bdd46ff692, 0x1, 0x0, ...}, ...})
/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc/handle.go:323 +0x550
github.com/matrixorigin/matrixone/pkg/txn/storage/tae.(*taeStorage).Commit(
, {_, _}, {{0xc05c926060, 0x10, 0x10}, 0x0, {0x17e934bdd46ff692, 0x1, 0x0, ...}, ...})
/go/src/github.com/matrixorigin/matrixone/pkg/txn/storage/tae/storage.go:87 +0xd5
github.com/matrixorigin/matrixone/pkg/txn/service.(*service).Commit(0xc01495e900, {0x74435b8, 0xc081699860}, 0xc06bef6000, 0xc01e610c00)
/go/src/github.com/matrixorigin/matrixone/pkg/txn/service/service_cn_handler.go:263 +0xe35
github.com/matrixorigin/matrixone/pkg/tnservice.(*store).handleCommit(0xc007e4d1e0, {0x74435b8, 0xc081699860}, 0xc06bef6000, 0xc01e610c00)
/go/src/github.com/matrixorigin/matrixone/pkg/tnservice/store_rpc_handler.go:124 +0x358

mo-log:
dn-panic.tar.gz

mo log link:
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-c1c365c-202408062226%5C%22,%20matrixorigin_io_component%3D%5C%22DNSet%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221722964344099%22,%22to%22:%221722966951750%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

1. run a mo cluster with config in this issue
2. run tpch 10G loop test processes in one independant tenant
3. run tpcc 10 warehouse and 10 ternimals longrunnig test processes in one independant tenant, prepare mode
4. run sysbench mixed cases(insert/delete/update/select) longrunnig test processes with 75 terminals in one independant tenant,non-prepare mode
5. run another sysbench mixed cases(insert/delete/update/select) longrunnig test processe with  75 terminals in one independant tenant,non-prepare mode

Additional information

No response

@aressu1985 aressu1985 added this to the 1.3.0 milestone Aug 7, 2024
@XuPeng-SH XuPeng-SH assigned LeftHandCold and unassigned XuPeng-SH Aug 7, 2024
@XuPeng-SH
Copy link
Contributor

@LeftHandCold please help to investigate this issue

@LeftHandCold
Copy link
Contributor

LeftHandCold commented Aug 8, 2024

启动多cn,修改mpool的上限,使cn出现mpool oom可以复现这个问题,排查后发现cn insert的时候已经把data batch的schema修改了。所以dn schema 对不上panic

@LeftHandCold
Copy link
Contributor

image image

@LeftHandCold
Copy link
Contributor

@ouyuanning 帮忙排查一下

@LeftHandCold
Copy link
Contributor

debug代码如下:
image

@ouyuanning
Copy link
Contributor

还没时间看

@ouyuanning
Copy link
Contributor

看着相似putBatch多次调用的data race问题。明天先提个PR

@ouyuanning
Copy link
Contributor

提的那个PR估计跟这个没关系,这个似乎是DN的。锦赛帮忙复现看看吧

@ouyuanning
Copy link
Contributor

可以看看batch和vector的指针是否是同一个

@jensenojs
Copy link
Contributor

working on it

@jensenojs
Copy link
Contributor

image

不需要改mpool, 直接在main上跑tpch-10g也跑出来了, 但是每次复现跑出来的都是不一样的问题, 还要再看

@ouyuanning
Copy link
Contributor

提的那个PR估计跟这个没关系,这个似乎是DN的。锦赛帮忙复现看看吧

CN data race改掉这个batch的值,然后传给DN一个错的batch,还是可能会有这样的问题的。

@jensenojs
Copy link
Contributor

待验证, 向动哥请教一下

@aressu1985
Copy link
Contributor Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants