Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Milvus crashes with segment fault when using skip index #35882

Closed
1 task done
chyezh opened this issue Aug 31, 2024 · 6 comments
Closed
1 task done

[Bug]: Milvus crashes with segment fault when using skip index #35882

chyezh opened this issue Aug 31, 2024 · 6 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@chyezh
Copy link
Contributor

chyezh commented Aug 31, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.9
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

2024-08-30 22:39:19.324	
2024-08-30 22:39:19.324	SIGNAL CATCH BY NON-GO SIGNAL HANDLER
2024-08-30 22:39:19.324	SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x7f5e7b6e5d4c
2024-08-30 22:39:19.324	BACKTRACE:
2024-08-30 22:39:20.057	(null)
2024-08-30 22:39:20.057		(null):0 pc=0x7f5088977840
2024-08-30 22:39:20.091	(null)
2024-08-30 22:39:20.091		(null):0 pc=0x7f5f56851840
2024-08-30 22:39:20.140	_ZNSt11char_traitsIcE7compareEPKcS2_m
2024-08-30 22:39:20.140		/usr/include/c++/12/bits/char_traits.h:389 pc=0x7f508ace27da
2024-08-30 22:39:20.140	_ZNKSt17basic_string_viewIcSt11char_traitsIcEE7compareES2_
2024-08-30 22:39:20.140		/usr/include/c++/12/string_view:320 pc=0x7f508ace27da
2024-08-30 22:39:20.186	_ZStgtIcSt11char_traitsIcEEbNSt15__type_identityISt17basic_string_viewIT_T0_EE4typeES6_
2024-08-30 22:39:20.186		/usr/include/c++/12/string_view:628 pc=0x7f508b21c29f
2024-08-30 22:39:20.186	_ZNK6milvus9SkipIndex18MinMaxBinaryFilterINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENSt9enable_ifIXsrNS0_13IsAllowedTypeIT_EE5valueEbE4typeERKNS_17FieldChunkMetricsERKSA_SI_bb
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/index/SkipIndex.h:161 pc=0x7f508b21c29f
2024-08-30 22:39:20.186	_ZNK6milvus9SkipIndex18CanSkipBinaryRangeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS8_10ComparableENS8_8HashableEEEElRKT_SH_bb
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/index/SkipIndex.h:61 pc=0x7f508b21c29f
2024-08-30 22:39:20.186	_ZN6milvus4exec17PhyTermFilterExpr14CanSkipSegmentINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbv
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:140 pc=0x7f508b21c29f
2024-08-30 22:39:20.186	_ZN6milvus4exec17PhyTermFilterExpr17InitPkCacheOffsetEv
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:164 pc=0x7f508b21326f
2024-08-30 22:39:20.186	_ZN6milvus4exec17PhyTermFilterExpr14ExecPkTermImplEv
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:196 pc=0x7f508b2142b7
2024-08-30 22:39:20.186	_ZN6milvus4exec17PhyTermFilterExpr4EvalERNS0_7EvalCtxERSt10shared_ptrINS_10BaseVectorEE
2024-08-30 22:39:20.186		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:25 pc=0x7f508b214763
2024-08-30 22:39:20.199	_ZNSt11char_traitsIcE7compareEPKcS2_m
2024-08-30 22:39:20.199		/usr/include/c++/12/bits/char_traits.h:389 pc=0x7f5f58bbc7da
2024-08-30 22:39:20.199	_ZNKSt17basic_string_viewIcSt11char_traitsIcEE7compareES2_
2024-08-30 22:39:20.199		/usr/include/c++/12/string_view:320 pc=0x7f5f58bbc7da
2024-08-30 22:39:20.206	_ZN6milvus4exec7ExprSet4EvalEiibRNS0_7EvalCtxERSt6vectorISt10shared_ptrINS_10BaseVectorEESaIS7_EE
2024-08-30 22:39:20.206		/root/milvus/internal/core/src/exec/expression/Expr.cpp:42 pc=0x7f508b1d6bb7
2024-08-30 22:39:20.223	_ZN6milvus4exec10FilterBits9GetOutputEv
2024-08-30 22:39:20.223		/root/milvus/internal/core/src/exec/operator/FilterBits.cpp:67 pc=0x7f508b284d4c
2024-08-30 22:39:20.244	_ZN6milvus4exec6Driver11RunInternalERSt10shared_ptrIS1_ERS2_INS0_13BlockingStateEERS2_INS_9RowVectorEE
2024-08-30 22:39:20.244		/root/milvus/internal/core/src/exec/Driver.cpp:249 pc=0x7f508b0ac533
2024-08-30 22:39:20.244	_ZN6milvus4exec6Driver4NextERSt10shared_ptrINS0_13BlockingStateEE
2024-08-30 22:39:20.244		/root/milvus/internal/core/src/exec/Driver.cpp:137 pc=0x7f508b0aced6
2024-08-30 22:39:20.263	_ZN6milvus4exec4Task4NextEPN5folly10SemiFutureINS2_4UnitEEE
2024-08-30 22:39:20.263		/root/milvus/internal/core/src/exec/Task.cpp:199 pc=0x7f508b0b3c67
2024-08-30 22:39:20.265	_ZStgtIcSt11char_traitsIcEEbNSt15__type_identityISt17basic_string_viewIT_T0_EE4typeES6_
2024-08-30 22:39:20.265		/usr/include/c++/12/string_view:628 pc=0x7f5f590f629f
2024-08-30 22:39:20.265	_ZNK6milvus9SkipIndex18MinMaxBinaryFilterINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENSt9enable_ifIXsrNS0_13IsAllowedTypeIT_EE5valueEbE4typeERKNS_17FieldChunkMetricsERKSA_SI_bb
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/index/SkipIndex.h:161 pc=0x7f5f590f629f
2024-08-30 22:39:20.265	_ZNK6milvus9SkipIndex18CanSkipBinaryRangeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbN6fluent9NamedTypeIlNS_4impl10FieldIdTagEJNS8_10ComparableENS8_8HashableEEEElRKT_SH_bb
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/index/SkipIndex.h:61 pc=0x7f5f590f629f
2024-08-30 22:39:20.265	_ZN6milvus4exec17PhyTermFilterExpr14CanSkipSegmentINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEbv
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:140 pc=0x7f5f590f629f
2024-08-30 22:39:20.265	_ZN6milvus4exec17PhyTermFilterExpr17InitPkCacheOffsetEv
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:164 pc=0x7f5f590ed26f
2024-08-30 22:39:20.265	_ZN6milvus4exec17PhyTermFilterExpr14ExecPkTermImplEv
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:196 pc=0x7f5f590ee2b7
2024-08-30 22:39:20.265	_ZN6milvus4exec17PhyTermFilterExpr4EvalERNS0_7EvalCtxERSt10shared_ptrINS_10BaseVectorEE
2024-08-30 22:39:20.265		/root/milvus/internal/core/src/exec/expression/TermExpr.cpp:25 pc=0x7f5f590ee763
2024-08-30 22:39:20.286	_ZN6milvus5query19ExecPlanNodeVisitor23ExecuteExprNodeInternalERKSt10shared_ptrINS_4plan8PlanNodeEEPKNS_7segcore24SegmentInternalInterfaceElRNS_6bitset6BitsetINSC_6detail33VectorizedElementWiseBitsetPolicyImNSE_17VectorizedDynamicEEEN5folly8fbvectorIhSaIhEEELb1EEERbRSt6vectorIlSaIlEE
2024-08-30 22:39:20.286		/root/milvus/internal/core/src/query/visitors/ExecPlanNodeVisitor.cpp:98 pc=0x7f508af076d4
2024-08-30 22:39:20.286	_ZN6milvus5query19ExecPlanNodeVisitor5visitERNS0_16RetrievePlanNodeE
2024-08-30 22:39:20.286		/root/milvus/internal/core/src/query/visitors/ExecPlanNodeVisitor.cpp:283 pc=0x7f508af0952c
2024-08-30 22:39:20.300	_ZN6milvus4exec7ExprSet4EvalEiibRNS0_7EvalCtxERSt6vectorISt10shared_ptrINS_10BaseVectorEESaIS7_EE
2024-08-30 22:39:20.300		/root/milvus/internal/core/src/exec/expression/Expr.cpp:42 pc=0x7f5f590b0bb7
2024-08-30 22:39:20.306	_ZN6milvus5query19ExecPlanNodeVisitor19get_retrieve_resultERNS0_8PlanNodeE
2024-08-30 22:39:20.306		/root/milvus/internal/core/src/query/generated/ExecPlanNodeVisitor.h:73 pc=0x7f508afa32e6
2024-08-30 22:39:20.306	_ZNK6milvus7segcore24SegmentInternalInterface8RetrieveEPNS_6tracer12TraceContextEPKNS_5query12RetrievePlanEmlb
2024-08-30 22:39:20.306		/root/milvus/internal/core/src/segcore/SegmentInterface.cpp:103 pc=0x7f508afa32e6
2024-08-30 22:39:20.325	_ZN6milvus4exec10FilterBits9GetOutputEv
2024-08-30 22:39:20.325		/root/milvus/internal/core/src/exec/operator/FilterBits.cpp:67 pc=0x7f5f5915ed4c
2024-08-30 22:39:20.361	_ZN6milvus4exec6Driver11RunInternalERSt10shared_ptrIS1_ERS2_INS0_13BlockingStateEERS2_INS_9RowVectorEE
2024-08-30 22:39:20.361		/root/milvus/internal/core/src/exec/Driver.cpp:249 pc=0x7f5f58f86533
2024-08-30 22:39:20.361	_ZN6milvus4exec6Driver4NextERSt10shared_ptrINS0_13BlockingStateEE
2024-08-30 22:39:20.361		/root/milvus/internal/core/src/exec/Driver.cpp:137 pc=0x7f5f58f86ed6
2024-08-30 22:39:20.382	_ZN6milvus4exec4Task4NextEPN5folly10SemiFutureINS2_4UnitEEE
2024-08-30 22:39:20.382		/root/milvus/internal/core/src/exec/Task.cpp:199 pc=0x7f5f58f8dc67
2024-08-30 22:39:20.404	_ZN6milvus5query19ExecPlanNodeVisitor23ExecuteExprNodeInternalERKSt10shared_ptrINS_4plan8PlanNodeEEPKNS_7segcore24SegmentInternalInterfaceElRNS_6bitset6BitsetINSC_6detail33VectorizedElementWiseBitsetPolicyImNSE_17VectorizedDynamicEEEN5folly8fbvectorIhSaIhEEELb1EEERbRSt6vectorIlSaIlEE
2024-08-30 22:39:20.404		/root/milvus/internal/core/src/query/visitors/ExecPlanNodeVisitor.cpp:98 pc=0x7f5f58de16d4
2024-08-30 22:39:20.404	_ZN6milvus5query19ExecPlanNodeVisitor5visitERNS0_16RetrievePlanNodeE
2024-08-30 22:39:20.404		/root/milvus/internal/core/src/query/visitors/ExecPlanNodeVisitor.cpp:283 pc=0x7f5f58de352c
2024-08-30 22:39:20.408	operator()
2024-08-30 22:39:20.408		/root/milvus/internal/core/src/segcore/segment_c.cpp:174 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	operator()
2024-08-30 22:39:20.408		/root/milvus/internal/core/src/futures/Future.h:181 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	makeTryWithNoUnwrap<const milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda()> >
2024-08-30 22:39:20.408		/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/Try-inl.h:254 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	makeTryWith<const milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda()> >
2024-08-30 22:39:20.408		/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/Try-inl.h:276 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	setWith<const milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda()> >
2024-08-30 22:39:20.408		/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/SharedPromise-inl.h:81 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	operator()<folly::Try<folly::Unit> >
2024-08-30 22:39:20.408		/root/milvus/internal/core/src/futures/Future.h:187 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	operator()
2024-08-30 22:39:20.408		/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:941 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<folly::Unit> >
2024-08-30 22:39:20.408		/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:139 pc=0x7f508b06dec2
2024-08-30 22:39:20.408	invoke<folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<CProto>::asyncProduce<AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncRetrieve(CTraceContext, CSegmentInterface, CRetrievePlan, uint64_t, int64_t, bool)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >, folly::Unit>

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@chyezh chyezh added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 31, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Aug 31, 2024

/assign

@chyezh
Copy link
Contributor Author

chyezh commented Aug 31, 2024

@yanliang567
Copy link
Contributor

@chyezh what is 'skip index'?
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 31, 2024
@yanliang567 yanliang567 added this to the 2.4.11 milestone Aug 31, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Aug 31, 2024

@chyezh what is 'skip index'? /unassign

it's used to filter out the segment that never meet a incoming expr.

@chyezh
Copy link
Contributor Author

chyezh commented Sep 2, 2024

  1. loadFieldData will build a new skipindex for sealed segment field, and load field data;
  2. The SkipIndex hold two string_view to determine the range of field.
  3. LoadIndex will release the field data if the index's HasRawData is true.
  4. Then the skipindex's access trigger the segment fault.

sre-ci-robot pushed a commit that referenced this issue Sep 3, 2024
@chyezh
Copy link
Contributor Author

chyezh commented Sep 3, 2024

Verify the issue by asan with following operations:

  • Use malloc to replace mmap.
  • Create a collection with string primary key (trie index).
  • Use in expr to call the query rpc.
  • insert 5000 columns and flush.

Before fixing:

==3048154==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6300000b82eb at pc 0x7f085fc9e4b5 bp 0x7f0784f57420 sp 0x7f0784f56bc8
READ of size 9 at 0x6300000b82eb thread T232
    #0 0x7f085fc9e4b4 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861
    #1 0x7f085fc9ebc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:892
    #2 0x7f085fc9ebc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:887
    #3 0x7f0859460b16 in std::char_traits<char>::compare(char const*, char const*, unsigned long) /usr/include/c++/11/bits/char_traits.h:389
    #4 0x7f08594775b8 in std::basic_string_view<char, std::char_traits<char> >::compare(std::basic_string_view<char, std::char_traits<char> >) const /usr/include/c++/11/string_view:315
    #5 0x7f085a5665aa in bool std::operator><char, std::char_traits<char> >(std::__type_identity<std::basic_string_view<char, std::char_traits<char> > >::type, std::basic_string_view<char, std::char_traits<char> >) /usr/include/c++/11/string_view:623
    #6 0x7f085a4904fb in std::enable_if<milvus::SkipIndex::IsAllowedType<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::value, bool>::type milvus::SkipIndex::MinMaxBinaryFilter<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(milvus::FieldChunkMetrics const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) const /home/chyezh/repository/chyezh/milvus/internal/core/src/index/SkipIndex.h:161

...

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x0c608000f000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c608000f010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c608000f020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c608000f030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c608000f040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c608000f050: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
  0x0c608000f060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa

...

After fixing, no memory violation reported.

@chyezh chyezh closed this as completed Sep 3, 2024
sre-ci-robot pushed a commit that referenced this issue Sep 3, 2024
jaime0815 pushed a commit that referenced this issue Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants