Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest XDP fixes #27

Open
wants to merge 5 commits into
base: idpf-libie-new
Choose a base branch
from

Conversation

michalQb
Copy link

@michalQb michalQb commented Sep 6, 2024

No description provided.

The commit 35d653a ("idpf: implement XDP_SETUP_PROG in ndo_bpf for
splitq") uses soft_reset to perform a full vport reconfiguration after
changes in XDP setup.
Unfortunately, the soft_reset may fail after attaching the XDP program
to the vport. It can happen when the HW limits resources that can be
allocated to fulfill XDP requirements.
In such a case, before we return an error from XDP_SETUP_PROG, we have
to fully restore the previous vport state, including removing the XDP
program.

In order to remove the already loaded XDP program in case of reset
error, re-implement the error handling path and move some calls to the
XDP callback.

Fixes: 35d653a ("idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq")
Signed-off-by: Michal Kubiak <[email protected]>
The commit f00334f ("libeth: support native XDP and register memory model")
removed calling of "page_pool_destroy()" from "libeth_rx_fq_destroy()"
and replaced it with the call of "xdp_unreg_page_pool()". There was an
assumption that "page_pool_destroy()" will be called during unregistering
the memory model.
Unfortunately, although "page_pool_destroy()" is called from
"xdp_unreg_mem_model()", it does not perform releasing of all resources.
All page pool resources can be released completely only if the user
reference counter is decremented to zero.

In the libeth scenario, the "user_cnt" page_pool reference counter is
set to 1 during the page_pool creation. Then, it is again incremented to
2 while the memory model is being registered.
Therefore, calling "xdp_unreg_mem_model()" decrements the reference
counter only by one and some page_pool resources are never released.

The page_pool API should be called in a symmetric way, so:
 - each explicit call of "page_pool_create()" should be followed by
   "page_pool_destroy()",
 - each call of "xdp_reg_page_pool()" should be followed by
   "xdp_unreg_page_pool()".

Fix the issue by restoring the call of "page_pool_destroy()" to let the
page_pool to decrement its reference counter back to zero.

Fixes: f00334f ("libeth: support native XDP and register memory model")
Signed-off-by: Michal Kubiak <[email protected]>
Block the soft reset and vport re-configuration from "idpf_remove()"
context.
The XDP_SETUP_PROG command is normally called while the netdev is being
unregistered. In such a case the IDPF driver should unload the XDP
program and return success. Otherwise, the kernel warning will be shown
in dmesg.

Fixes: 79d940b ("idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq")
Signed-off-by: Michal Kubiak <[email protected]>
@michalQb michalQb changed the title Fix for reset + fix for PP release Latest XDP fixes Sep 18, 2024
Fixes: 98e4247 ("idpf: prepare structures to support xdp")
Fixes: f12c11e ("idpf: add vc functions to manage selected queues")
Signed-off-by: Michal Kubiak <[email protected]>
alobakin pushed a commit that referenced this pull request Oct 15, 2024
Wesley reported an issue:

==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
 __ext4_ioctl+0x4e0/0x1800
 ext4_ioctl+0x12/0x20
 __x64_sys_ioctl+0x99/0xd0
 x64_sys_call+0x1206/0x20d0
 do_syscall_64+0x72/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================

While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.

The reproduction of the problem requires the following:

 o_group = flexbg_size * 2 * n;
 o_size = (o_group + 1) * group_size;
 n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
 o_size = (n_group + 1) * group_size;

Take n=0,flexbg_size=16 as an example:

              last:15
|o---------------|--------------n-|
o_group:0    resize to      n_group:30

The corresponding reproducer is:

img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M

Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.

[ Note: another reproucer which this commit fixes is:

  img=test.img
  rm -f $img
  truncate -s 25MiB $img
  mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
  truncate -s 3GiB $img
  dev=`losetup -f --show $img`
  mkdir -p /tmp/test
  mount $dev /tmp/test
  resize2fs $dev 3G
  umount $dev
  losetup -d $dev

  -- TYT ]

Reported-by: Wesley Hershberger <[email protected]>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Tested-by: Alexander Mikhalitsyn <[email protected]>
Tested-by: Eric Sandeen <[email protected]>
Fixes: 665d3e0 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant