Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mblk leak replace_headers #158

Closed
rzezeski opened this issue May 25, 2022 · 2 comments
Closed

mblk leak replace_headers #158

rzezeski opened this issue May 25, 2022 · 2 comments
Assignees

Comments

@rzezeski
Copy link
Contributor

Sometimes, when running iperf3 between two guests running in a virtual sled (as setup by my locally modified testbed opte-ddm-x2), I'll manage to hit a panic where we try to dereference a NULL pointer in Packet::emit_headers().

> ::status
debugging crash dump /var/crash/unknown/vmcore.3 (64-bit) from sled1
operating system: 5.11 helios-1.0.21050 (i86pc)
build version: remotes/ci-build/xde-0-g7aeff7448e

image uuid: 5cb917ff-9b86-e6dc-8f92-97ce99ed15eb
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe00050c36d0 addr=20 occurred in module "xde" due to a NULL pointer dereference
dump content: kernel pages only

> $C      
fffffe00050c38b0 _ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x238()
fffffe00050c3b30 _ZN4opte6engine4port4Port7process17h1da75a5159c16502E+0x2d92()
fffffe00050c4300 xde_mc_tx+0x290()
fffffe00050c4330 mac_ring_tx+0x43(fffffe0396759030, 0, fffffe03eab361a0)
fffffe00050c4380 mac_provider_tx+0x85(fffffe0396759030, 0, fffffe03eab361a0, fffffe0395e3ca00)
fffffe00050c4440 mac_tx+0x295(fffffe0395e3ca00, fffffe03eab361a0, c072da1a, 1, 0)
fffffe00050c4490 str_mdata_fastpath_put+0x8e(fffffe038d793398, fffffe03eab361a0, c072da1a, 1)
fffffe00050c45a0 ip_xmit+0x843(fffffe03eab361a0, fffffe038682f1c0, 180036060, 5dc, c072da1a, 0, fffffe0300000000, fffffe0396ce3930)
fffffe00050c47f0 ire_send_wire_v4+0x345(fffffe0396d00df0, fffffe03eab361a0, fffffe0385c158b4, fffffe0396ce37c0, fffffe0384b9ea30)
fffffe00050c4870 conn_ip_output+0x1d4(fffffe03eab361a0, fffffe0396ce37c0)
fffffe00050c48a0 tcp_send_data+0x58(fffffe0396d0d300, fffffe03eab361a0)
fffffe00050c49b0 tcp_send+0x8d2(fffffe0396d0d300, 5a8, 34, 20, 0, fffffe00050c4a1c, fffffe00050c4a14, fffffe00050c4a18, fffffe00050c4a20, d5858687aa4)
fffffe00050c4a80 tcp_wput_data+0x68a(fffffe0396d0d300, fffffe03ea8ada20, 0)
fffffe00050c4b10 tcp_output+0xbf(fffffe0396d0d000, fffffe03ea8ada20, fffffe0384ba7dc0, 0)
fffffe00050c4ba0 squeue_enter+0x409(fffffe0384ba7dc0, fffffe03ea8ada20, fffffe03ea8ada20, 1, 0, 4, fffffe0300000007)
fffffe00050c4c20 tcp_sendmsg+0x16c(fffffe0396d0d000, fffffe03ea8ada20, fffffe00050c4d38, fffffe03933d21a0)
fffffe00050c4cb0 so_sendmsg+0x24a(fffffe0392fa2a88, fffffe00050c4d38, fffffe00050c4e78, fffffe03933d21a0)
fffffe00050c4d10 socket_sendmsg+0x62(fffffe0392fa2a88, fffffe00050c4d38, fffffe00050c4e78, fffffe03933d21a0)
fffffe00050c4db0 socket_vop_write+0x61(fffffe0392fa7a40, fffffe00050c4e78, 0, fffffe03933d21a0, 0)
fffffe00050c4e30 fop_write+0x60(fffffe0392fa7a40, fffffe00050c4e78, 0, fffffe03933d21a0, 0)
fffffe00050c4f00 write+0x2c6(4, fffffc7fef23f000, 20000)
fffffe00050c4f10 sys_syscall+0x17d()

Looking at a bit of the disassembly context we gain a bit more insight as to what is happening.

> ::dis '_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E' ! grep -C 4 '+0x238'
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x22a:     addq   %rax,%rdi
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x22d:     addq   $0xe,%rdi
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x231:     xorl   %esi,%esi
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x233:     call   +0x36256c8       <allocb>
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x238:     movq   0x20(%rax),%rcx
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x23c:     movq   0x28(%rax),%rdx
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x240:     subq   0x18(%rax),%rcx
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x244:     movq   0x10(%rdx),%rsi
_ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x248:     subq   0x8(%rdx),%rsi

> ::print -ta mblk_t
0 mblk_t {
    0 struct msgb *b_next 
    8 struct msgb *b_prev 
    10 struct msgb *b_cont 
    18 unsigned char *b_rptr 
    20 unsigned char *b_wptr 
    28 struct datab *b_datap 
    30 unsigned char b_band 
    31 unsigned char b_tag 
    32 unsigned short b_flag 
    38 queue_t *b_queue 
}

Remember, a function returns via eax/rax, thus we are trying to deref b_wptr on the returned mblk_t; but allocb() returned NULL (which it is well within its rights to do). This corresponds with the following OPTE code, which itself is trying to write out the new outer + inner headers to a new mblk, which it then links to the front of the existing packet.

        let mut hdr_seg = unsafe { PacketSeg::wrap(allocb(total_sz)) };

        // Emit each raw header in turn to new segment.
        if self.state.headers.outer.is_some() {
            self.state.headers.outer.as_ref().unwrap().write(&mut hdr_seg)?;
        }

As you can see there is no check for a NULL return.

At some point, the idea is to have a way to negotiate a "prefix margin" with the client, allowing OPTE to write the new headers to the existing mblk_t, thus avoiding additional allocation. However, a good interim step is to have OPTE perform upfront allocation of mblks for the purpose of prefixing headers. OPTE will need this capability regardless, as even in a future where OPTE can negotiate a prefix margin with the clilent, the client may decide it doesn't want to provide said margin, or may not be able to provide the amount of margin required.

@rzezeski rzezeski self-assigned this May 25, 2022
@rzezeski
Copy link
Contributor Author

After spending some time working on a solution to have up-front buffer allocation and make use of desballoc(9F) I realized it's a bit more of a pain in Rust than in C because of the way desballoc(9F) works; namely things get kind of whacky because of needing to pass a heap-allocated frtn_t that also references its containing structure.

While working on this I realized the real issue at the moment is a simple memory leak in Packet<Parsed>::replace_headers(): it can orphan an mblk segment without freeing it. Rather than spend more time on buffer caches I think it's more productive to just fix the leak and move on to other things. Besides, for our immediate use case in Oxide VPC we probably want viona allocating prefix margin on our behalf since it already has a buffer cache fit for purpose.

Anyways, here is the mdb findleaks report for posterity:

> ::findleaks                                                                                   
CACHE             LEAKED           BUFCTL CALLER                                                
fffffe0381e28848       1 fffffe03870652a0 AcpiOsAllocate+0x1c                                   
fffffe0381e37588       2 fffffe03af0b55f0 allocb+0x50                                           
fffffe0381e37588       2 fffffe03b981d518 allocb+0x50              
fffffe0381e37588       4 fffffe03b981d0e0 allocb+0x50                                           
fffffe0381e37b08  171109 fffffe03b9b442b0 allocb+0x50                                           
fffffe0381e37b08       3 fffffe03c1baa898 allocb+0x50                                           
fffffe0381e37b08       1 fffffe03b9b437b8 allocb+0x50                                                                                                                                           
fffffe0381e37b08       1 fffffe03b9b43890 allocb+0x50                                                                                                                                           
fffffe0381e37b08  100526 fffffe03b9b38210 allocb+0x50                                           
fffffe0381e37b08   74436 fffffe03b9b43a40 allocb+0x50                                           
fffffe0381e37588       1 fffffe03a7ea7700 allocb+0x50                                           
fffffe0381e37b08       3 fffffe03b9b44028 allocb+0x50                                           
fffffe0381e37b08   23510 fffffe03b9b46110 allocb+0x50                                           
fffffe0381e37848       1 fffffe03973a9720 allocb+0x50                                           
fffffe0381e37588       1 fffffe03af0b5440 allocb+0x50                                                                                                                                           
fffffe0381e37b08       1 fffffe03b9b39d10 allocb+0x50                                           
fffffe0381e37b08       6 fffffe03b9b43458 allocb+0x50                                           
fffffe0381e37b08       1 fffffe038d042b18 allocb+0x50
fffffe0381e37b08       1 fffffe03c1a24b48 allocb+0x50
fffffe0381e37b08       1 fffffe038d041450 allocb+0x50
fffffe0381e37b08    2539 fffffe03b9b382e8 allocb+0x50
fffffe0381e37588       2 fffffe03a9990388 allocb+0x50
fffffe0381e372c8       2 fffffe03ac048548 dblk_constructor+0x46
fffffe0381e372c8  100463 fffffe03b06e1600 dblk_constructor+0x46
fffffe0381e372c8       1 fffffe039322d2d8 dblk_constructor+0x46
fffffe0381e372c8       1 fffffe03b981b3e8 dblk_constructor+0x46
fffffe0381e372c8       2 fffffe03b9275de0 dblk_constructor+0x46
fffffe0381e372c8  118000 fffffe03b9274208 dblk_constructor+0x46
fffffe0381e372c8       2 fffffe03b9b377f0 dblk_constructor+0x46
fffffe0381e372c8      24 fffffe03ae0f54c0 dblk_constructor+0x46
fffffe0381e372c8       4 fffffe03b981f528 dblk_constructor+0x46
fffffe0381e372c8  153648 fffffe03b9b36998 dblk_constructor+0x46
fffffe0381e372c8       1 fffffe03ac047978 dblk_constructor+0x46
fffffe0381e372c8       2 fffffe03b981fa38 dblk_constructor+0x46
fffffe0381e372c8       1 fffffe03b9275720 dblk_constructor+0x46

> fffffe03b9b442b0$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
fffffe03b9b442b0 fffffe03b9b5c9c0       284110d931 fffffe039b2b8060
                 fffffe0381e37b08 fffffe0382ec77c0                0
                 kmem_cache_alloc_debug+0x2fc
                 kmem_cache_alloc+0x18d
                 allocb+0x50
                 _ZN4opte6engine6packet42Packet$LT$opte..engine..packet..Parsed$GT$12emit_headers17h18a49ea075c6b454E+0x238
                 _ZN4opte6engine4port4Port7process17h1da75a5159c16502E+0x2d92
                 xde_mc_tx+0x290
                 mac_ring_tx+0x43
                 mac_provider_tx+0x85
                 mac_tx+0x295
                 str_mdata_fastpath_put+0x8e
                 ip_xmit+0x843
                 ire_send_wire_v4+0x345
                 conn_ip_output+0x1d4
                 tcp_send_data+0x58
                 tcp_send+0x8d2

@rzezeski rzezeski changed the title Panic during Packet::emit_headers() mblk leak replace_headers May 28, 2022
rzezeski added a commit that referenced this issue May 28, 2022
@rzezeski
Copy link
Contributor Author

Tested by running iperf3 between guests and verifying memory usage stayed consistent.

> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     102486               400   20%
Boot pages                    139                 0    0%
ZFS File Data               68522               267   13%
Anon                        43468               169    8%
Exec and libs                2405                 9    0%
Page cache                  10943                42    2%
Free (cachelist)            11424                44    2%
Free (freelist)            280979              1097   54%

Total                      520366              2032
Physical                   520365              2032

> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     102517               400   20%
Boot pages                    139                 0    0%
ZFS File Data               68522               267   13%
Anon                        43472               169    8%
Exec and libs                2405                 9    0%
Page cache                  10956                42    2%
Free (cachelist)            11424                44    2%
Free (freelist)            280931              1097   54%

Total                      520366              2032
Physical                   520365              2032

Addressed in 1245055.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant