Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syzkaller: memory leak in subflow_create_ctx #356

Closed
cpaasch opened this issue Feb 14, 2023 · 2 comments
Closed

syzkaller: memory leak in subflow_create_ctx #356

cpaasch opened this issue Feb 14, 2023 · 2 comments

Comments

@cpaasch
Copy link
Member

cpaasch commented Feb 14, 2023

Head: 89ce2ca (02/28)
3038497 (02/27)
0150d5b

Trace:

BUG: memory leak
unreferenced object 0xffff88812df5a700 (size 256):
  comm "syz-executor.3", pid 2701, jiffies 4297168679 (age 63.966s)
  hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de  ........".......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000109c3810>] kmalloc_trace+0x21/0x90 mm/slab_common.c:1062
    [<00000000a01124a9>] kmalloc include/linux/slab.h:580 [inline]
    [<00000000a01124a9>] kzalloc include/linux/slab.h:720 [inline]
    [<00000000a01124a9>] subflow_create_ctx+0x5b/0xf0 net/mptcp/subflow.c:1752
    [<00000000a180cdc9>] subflow_ulp_init+0x3a/0x140 net/mptcp/subflow.c:1832
    [<00000000cf1a8bf5>] __tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
    [<00000000cf1a8bf5>] tcp_set_ulp+0x142/0x230 net/ipv4/tcp_ulp.c:167
    [<00000000382f383a>] mptcp_subflow_create_socket+0x171/0x300 net/mptcp/subflow.c:1715
    [<000000002fe382d9>] __mptcp_socket_create net/mptcp/protocol.c:89 [inline]
    [<000000002fe382d9>] __mptcp_nmpc_socket.part.0+0x83/0x1e0 net/mptcp/protocol.c:122
    [<00000000fa3a0e61>] __mptcp_nmpc_socket net/mptcp/protocol.c:115 [inline]
    [<00000000fa3a0e61>] mptcp_bind+0x55/0xe0 net/mptcp/protocol.c:3714
    [<00000000e3b192be>] __sys_bind+0x89/0x100 net/socket.c:1801
    [<00000000f1ffd90b>] __do_sys_bind net/socket.c:1812 [inline]
    [<00000000f1ffd90b>] __se_sys_bind net/socket.c:1810 [inline]
    [<00000000f1ffd90b>] __x64_sys_bind+0x18/0x20 net/socket.c:1810
    [<000000000703658a>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<000000000703658a>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:80
    [<00000000d12f04a9>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

No reproducers yet.

Kconfig: Kconfig_k8_kmemleak.txt

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Mar 1, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Reported-by: Christoph Paasch <[email protected]>
Closes: multipath-tcp/mptcp_net-next#356
Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Mar 6, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
@matttbe matttbe closed this as completed in 7161321 Mar 6, 2023
matttbe pushed a commit that referenced this issue Mar 6, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
matttbe pushed a commit that referenced this issue Mar 7, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
matttbe pushed a commit that referenced this issue Mar 7, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
@matttbe
Copy link
Member

matttbe commented Mar 7, 2023

@cpaasch this issue has been fixed we think but hard to confirm without reproducer.

matttbe pushed a commit that referenced this issue Mar 7, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Mar 8, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Mar 8, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Mar 9, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Mar 9, 2023
When subflow_syn_recv_sock() loses the inet hash insert race, the
newly created children will be released by inet_csk_complete_hashdance().

In that scenario, without any further hint, the ulp release callback
will keep the ulp context alive, expecting that the msk socket will
later free it.

Anyway the dying child is not linked to any msk socket, and the context
will be leaked, as reported by Christoph.

Address the issue explicitly releasing the context in the critical
scenario.

Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Closes: #356
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
matttbe added a commit that referenced this issue Mar 9, 2023
This reverts commit ccd4c33.

According to Paolo, this one is causing issue #356 and indirectly #370.

Closes: #356
Signed-off-by: Matthieu Baerts <[email protected]>
@matttbe
Copy link
Member

matttbe commented Mar 9, 2023

FYI, from IRC, Paolo pointed out that this issue was caused by "mptcp: drop legacy code", requiring "mptcp: fix possible memory leak on syn_recv race" to fix this issue but then causing #370.

It has now been reverted:

  • 4ef602b: Revert "mptcp: drop legacy code"

matttbe pushed a commit that referenced this issue Dec 1, 2023
With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #77      fentry_fexit:FAIL
  #78/1    fentry_test/fentry:FAIL
  #78      fentry_test:FAIL
  #82/1    fexit_test/fexit:FAIL
  #82      fexit_test:FAIL
  #112/1   kprobe_multi_test/skel_api:FAIL
  #112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  #112     kprobe_multi_test:FAIL
  #356/17  test_global_funcs/global_func17:FAIL
  #356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/#77 etc. except #356.

For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
matttbe pushed a commit that referenced this issue Nov 8, 2024
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <[email protected]>
Cc: Evan Quan <[email protected]>
Cc: Wenyou Yang <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703)
Cc: [email protected] # v6.6+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants