You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
maxRtt=4720 maxBdp=236000
Running Simulation.
The final active chunks per dimension 1 after allocating to queues is: 1
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
total nodes: 144
Success in opening workload file
model_parallel_NPU_group: is: 8
checkpoints layers are:
layers initiating fwd_in_bckwd are:
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
id: embedding_layer , depen: -1 , wg_comp_time: 1
type: HYBRID_TRANSFORMER_FWD_IN_BCKWD ,num passes: 1 ,lines: 1 compute scale: 1 ,comm scale: 1
stat path: ./ncclFlowModel_ ,total rows: 1 ,stat row: 0
CSV path and filename: ./ncclFlowModel_detailed_144.csv
CSV path and filename: ./ncclFlowModel_EndToEnd_144.csv
=================================================================
==9941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000fd2f74 at pc 0x7f475725362f bp 0x7fff94b9a270 sp 0x7fff94b9a260
READ of size 4 at 0x602000fd2f74 thread T0
#0 0x7f475725362e in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038
#1 0x7f475725200b in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
#2 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
#3 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
#4 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
#5 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
#6 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
#7 0x7f473bda2d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#8 0x7f473bda2e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#9 0x556283050384 in _start (/root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/build/scratch/ns3.36.1-AstraSimNetwork-debug+0x1d3384)
0x602000fd2f74 is located 0 bytes to the right of 4-byte region [0x602000fd2f70,0x602000fd2f74)
allocated by thread T0 here:
#0 0x7f47694b51e7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
#1 0x55628316e51c in __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
#2 0x556283156623 in std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
#3 0x556283125b33 in std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
#4 0x5562830fc49b in std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) /usr/include/c++/11/bits/stl_vector.h:361
#5 0x5562830d302a in std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) /usr/include/c++/11/bits/stl_vector.h:305
#6 0x5562830affda in std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) /usr/include/c++/11/bits/stl_vector.h:555
#7 0x7f4757251f96 in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
#8 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
#9 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
#10 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
#11 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
#12 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
#13 0x7f473bda2d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038 in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >)
Shadow bytes around the buggy address:
0x0c04801f2590: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
0x0c04801f25a0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
0x0c04801f25b0: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
0x0c04801f25c0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
0x0c04801f25d0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
=>0x0c04801f25e0: fa fa 04 fa fa fa 04 fa fa fa 00 fa fa fa[04]fa
0x0c04801f25f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2610: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2620: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c04801f2630: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==9941==ABORTING
Reproduce
NS3_SANITIZE
https://github.com/aliyun/ns-3-alibabacloud/blob/master/simulation/CMakeLists.txt#L61Logs
Potential Fix
https://github.com/aliyun/SimAI/blob/master/astra-sim-alibabacloud/astra-sim/system/MockNcclGroup.cc#L2038
Change to
The text was updated successfully, but these errors were encountered: