[syzkaller] WARNING in __mptcp_destroy_sock #135

cpaasch · 2021-01-14T19:04:55Z

------------[ cut here ]------------
WARNING: CPU: 1 PID: 22151 at net/mptcp/protocol.c:2545 __mptcp_destroy_sock+0x553/0x6c0 net/mptcp/protocol.c:2545
Modules linked in:
CPU: 1 PID: 22151 Comm: kworker/1:1 Not tainted 5.11.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:__mptcp_destroy_sock+0x553/0x6c0 net/mptcp/protocol.c:2545
Code: 5f c3 e8 50 85 82 fe 48 89 ef e8 f8 b2 ab ff eb b7 e8 41 85 82 fe be 03 00 00 00 4c 89 e7 e8 b4 0a 08 ff eb a3 e8 2d 85 82 fe <0f> 0b e9 f0 fe ff ff e8 21 85 82 fe 0f 0b e9 28 ff ff ff e8 15 31
RSP: 0018:ffffc90000f0fc10 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000300 RCX: 0000000000000000
RDX: ffff88801ea0d880 RSI: ffffffff82b36943 RDI: 0000000000000003
RBP: ffff88803995c000 R08: 0000000000000000 R09: 0000000000000003
R10: ffffffff82b36832 R11: 0000000000000001 R12: ffffc90000f0fc48
R13: dffffc0000000000 R14: ffffc90000f0fc48 R15: 1ffff920001e1f85
FS:  0000000000000000(0000) GS:ffff88811b300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37e8ae0ef8 CR3: 00000001080bf004 CR4: 0000000000170ee0
Call Trace:
 mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
 process_one_work+0x896/0x1170 kernel/workqueue.c:2275
 worker_thread+0x605/0x1350 kernel/workqueue.c:2421
 kthread+0x344/0x410 kernel/kthread.c:292
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296

HEAD:
0ba8e381319a ("mptcp: fix locking in mptcp_disconnect()") (HEAD) (3 hours ago)
2231a14 ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20210114T060000, mptcp_net-next/export) (13 hours ago)
bdb95de ("DO-NOT-MERGE: mptcp: add GitHub Actions") (13 hours ago)
37e13d6 ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (13 hours ago)
89c6174 ("mptcp: schedule work for better snd subflow selection") (13 hours ago)
e61bdd4 ("mptcp: do not queue excessive data on subflows") (13 hours ago)
22a5014 ("mptcp: re-enable sndbuf autotune") (13 hours ago)
5ec4e3d ("mptcp: always graft subflow socket to parent") (13 hours ago)
c7a8b47 ("bpf:selftests: add bpf_mptcp_sock() verifier tests") (13 hours ago)
4aad7af ("bpf:selftests: add MPTCP test base") (13 hours ago)
71880f9 ("bpf: add 'bpf_mptcp_sock' structure and helper") (13 hours ago)
197e7ab ("bpf: expose is_mptcp flag to bpf_tcp_sock") (13 hours ago)
127854a ("linux: handle MPTCP consistently with TCP") (13 hours ago)
fe5b34d ("mptcp: better msk-level shutdown.") (13 hours ago)
cdca685 ("mptcp: more strict state checking for acks") (13 hours ago)
0ae5b43 ("tcp: assign skb hash after tcp_event_data_sent") (mptcp_net-next/net-next) (15 hours ago)

No reproducer

CONFIG-file:
CONFIG.txt

The text was updated successfully, but these errors were encountered:

pabeni · 2021-01-20T10:08:33Z

this looks like a dup of #136: same calltrace and warning triggered by the same [bad] status (leaked forward memory, likely due to leaked or corrupted rmem_released)

cpaasch · 2021-01-28T21:50:07Z

Yep, dupe.

Remove the spinlock around the tree traversal as we are calling possibly sleeping functions. We do not need a spinlock here as there will be no modifications to this tree at this point. This prevents warnings like this to occur in dmesg: [ 653.774996] BUG: sleeping function called from invalid context at kernel/loc\ king/mutex.c:280 [ 653.775088] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1827, nam\ e: umount [ 653.775152] preempt_count: 1, expected: 0 [ 653.775191] CPU: 0 PID: 1827 Comm: umount Tainted: G W OE 5.17.0\ -rc7-00006-g4eb628dd74df #135 [ 653.775195] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-\ 1.fc33 04/01/2014 [ 653.775197] Call Trace: [ 653.775199] <TASK> [ 653.775202] dump_stack_lvl+0x34/0x44 [ 653.775209] __might_resched.cold+0x13f/0x172 [ 653.775213] mutex_lock+0x75/0xf0 [ 653.775217] ? __mutex_lock_slowpath+0x10/0x10 [ 653.775220] ? _raw_write_lock_irq+0xd0/0xd0 [ 653.775224] ? dput+0x6b/0x360 [ 653.775228] cifs_kill_sb+0xff/0x1d0 [cifs] [ 653.775285] deactivate_locked_super+0x85/0x130 [ 653.775289] cleanup_mnt+0x32c/0x4d0 [ 653.775292] ? path_umount+0x228/0x380 [ 653.775296] task_work_run+0xd8/0x180 [ 653.775301] exit_to_user_mode_loop+0x152/0x160 [ 653.775306] exit_to_user_mode_prepare+0x89/0xd0 [ 653.775315] syscall_exit_to_user_mode+0x12/0x30 [ 653.775322] do_syscall_64+0x48/0x90 [ 653.775326] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 187af6e98b44e5d8f25e1d41a92db138eb54416f ("cifs: fix handlecache and multiuser") Reported-by: kernel test robot <[email protected]> Cc: [email protected] Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]>

Currently tpm transactions are executed unconditionally in tpm_pm_suspend() function, which may lead to races with other tpm accessors in the system. Specifically, the hw_random tpm driver makes use of tpm_get_random(), and this function is called in a loop from a kthread, which means it's not frozen alongside userspace, and so can race with the work done during system suspend: tpm tpm0: tpm_transmit: tpm_recv: error -52 tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Call Trace: tpm_tis_status.cold+0x19/0x20 tpm_transmit+0x13b/0x390 tpm_transmit_cmd+0x20/0x80 tpm1_pm_suspend+0xa6/0x110 tpm_pm_suspend+0x53/0x80 __pnp_bus_suspend+0x35/0xe0 __device_suspend+0x10f/0x350 Fix this by calling tpm_try_get_ops(), which itself is a wrapper around tpm_chip_start(), but takes the appropriate mutex. Signed-off-by: Jan Dabros <[email protected]> Reported-by: Vlastimil Babka <[email protected]> Tested-by: Jason A. Donenfeld <[email protected]> Tested-by: Vlastimil Babka <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Fixes: e891db1 ("tpm: turn on TPM on suspend for TPM 1.x") [Jason: reworked commit message, added metadata] Signed-off-by: Jason A. Donenfeld <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

LE Create CIS command shall not be sent before all CIS Established events from its previous invocation have been processed. Currently it is sent via hci_sync but that only waits for the first event, but there can be multiple. Make it wait for all events, and simplify the CIS creation as follows: Add new flag HCI_CONN_CREATE_CIS, which is set if Create CIS has been sent for the connection but it is not yet completed. Make BT_CONNECT state to mean the connection wants Create CIS. On events after which new Create CIS may need to be sent, send it if possible and some connections need it. These events are: hci_connect_cis, iso_connect_cfm, hci_cs_le_create_cis, hci_le_cis_estabilished_evt. The Create CIS status/completion events shall queue new Create CIS only if at least one of the connections transitions away from BT_CONNECT, so that we don't loop if controller is sending bogus events. This fixes sending multiple CIS Create for the same CIS in the "ISO AC 6(i) - Success" BlueZ test case: < HCI Command: LE Create Co.. (0x08|0x0064) plen 9 #129 [hci0] Number of CIS: 2 CIS Handle: 257 ACL Handle: 42 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #130 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 #131 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 257 ... < HCI Command: LE Setup Is.. (0x08|0x006e) plen 13 #132 [hci0] ... > HCI Event: Command Complete (0x0e) plen 6 #133 [hci0] LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 ... < HCI Command: LE Create Co.. (0x08|0x0064) plen 5 #134 [hci0] Number of CIS: 1 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #135 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: ACL Connection Already Exists (0x0b) > HCI Event: LE Meta Event (0x3e) plen 29 #136 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 258 ... Fixes: c09b80b ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED") Signed-off-by: Pauli Virtanen <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>

The following BUG is reported when a ubiblock is removed: ================================================================== BUG: KASAN: slab-use-after-free in ubiblock_cleanup+0x88/0xa0 [ubi] Read of size 4 at addr ffff88810c8f3804 by task ubiblock/1716 CPU: 5 PID: 1716 Comm: ubiblock Not tainted 6.6.0-rc2+ #135 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x37/0x50 print_report+0xd0/0x620 kasan_report+0xb6/0xf0 ubiblock_cleanup+0x88/0xa0 [ubi] ubiblock_remove+0x121/0x190 [ubi] vol_cdev_ioctl+0x355/0x630 [ubi] __x64_sys_ioctl+0xc7/0x100 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f08d7445577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 8 RSP: 002b:00007ffde05a3018 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f08d7445577 RDX: 0000000000000000 RSI: 0000000000004f08 RDI: 0000000000000003 RBP: 0000000000816010 R08: 00000000008163a7 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000003 R13: 00007ffde05a3130 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 1715: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x7f/0x90 __alloc_disk_node+0x40/0x2b0 __blk_mq_alloc_disk+0x3e/0xb0 ubiblock_create+0x2ba/0x620 [ubi] vol_cdev_ioctl+0x581/0x630 [ubi] __x64_sys_ioctl+0xc7/0x100 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 0: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10e/0x190 __kmem_cache_free+0x96/0x220 bdev_free_inode+0xa4/0xf0 rcu_core+0x496/0xec0 __do_softirq+0xeb/0x384 The buggy address belongs to the object at ffff88810c8f3800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 4 bytes inside of freed 1024-byte region [ffff88810c8f3800, ffff88810c8f3c00) The buggy address belongs to the physical page: page:00000000d03de848 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10c8f0 head:00000000d03de848 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000000840(slab|head|node=0|zone=2) page_type: 0xffffffff() raw: 0200000000000840 ffff888100042dc0 ffffea0004244400 dead000000000002 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88810c8f3700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88810c8f3780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88810c8f3800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88810c8f3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88810c8f3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fix it by using a local variable to record the gendisk ID. Fixes: 77567b2 ("ubi: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: ZhaoLong Wang <[email protected]> Reviewed-by: Zhihao Cheng <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>

syzkaller report: kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716) The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows: TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... // TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag 6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // TCP B: send challenge ack to SYN_FIN_ACK 7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack // TCP A: <SND.UNA=101> 8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic __tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above. Fixes: 1da177e ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Dong Chenchen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

cpaasch added bug syzkaller labels Jan 14, 2021

cpaasch closed this as completed Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[syzkaller] WARNING in __mptcp_destroy_sock #135

[syzkaller] WARNING in __mptcp_destroy_sock #135

cpaasch commented Jan 14, 2021

pabeni commented Jan 20, 2021

cpaasch commented Jan 28, 2021

[syzkaller] WARNING in __mptcp_destroy_sock #135

[syzkaller] WARNING in __mptcp_destroy_sock #135

Comments

cpaasch commented Jan 14, 2021

pabeni commented Jan 20, 2021

cpaasch commented Jan 28, 2021