-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rosemary-t-oss #36487
Open
zainarbani
wants to merge
10,000
commits into
MiCode:rosemary-r-oss
Choose a base branch
from
zainarbani:rosemary-r-oss
base: rosemary-r-oss
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
rosemary-t-oss #36487
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ Upstream commit 17555297dbd5bccc93a01516117547e26a61caf1 ] Driver is leaking OF node reference from of_parse_phandle_with_fixed_args() in probe(). Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5680cf8d34e1552df987e2f4bb1bff0b2a8c8b11 ] Driver is leaking OF node reference from of_parse_phandle_with_fixed_args() in hns_mac_get_info(). Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e62beddc45f487b9969821fad3a0913d9bc18a2f ] Driver is leaking OF node reference from of_parse_phandle_with_fixed_args() in probe(). Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5accb265f7a1b23e52b0ec42313d1e12895552f4 ] ACPICA commit 2802af722bbde7bf1a7ac68df68e179e2555d361 If acpi_ps_get_next_namepath() fails, the previously allocated union acpi_parse_object needs to be freed before returning the status code. The issue was first being reported on the Linux ACPI mailing list: Link: https://lore.kernel.org/linux-acpi/[email protected]/T/ Link: acpica/acpica@2802af72 Signed-off-by: Armin Wolf <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e6169a8ffee8a012badd8c703716e761ce851b15 ] ACPICA commit 1280045754264841b119a5ede96cd005bc09b5a7 If acpi_ps_get_next_field() fails, the previously created field list needs to be properly disposed before returning the status code. Link: acpica/acpica@12800457 Signed-off-by: Armin Wolf <[email protected]> [ rjw: Rename local variable to avoid compiler confusion ] Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit dc171114926ec390ab90f46534545420ec03e458 ] It is not particularly useful to release locks (the EC mutex and the ACPI global lock, if present) and re-acquire them immediately thereafter during EC address space accesses in acpi_ec_space_handler(). First, releasing them for a while before grabbing them again does not really help anyone because there may not be enough time for another thread to acquire them. Second, if another thread successfully acquires them and carries out a new EC write or read in the middle if an operation region access in progress, it may confuse the EC firmware, especially after the burst mode has been enabled. Finally, manipulating the locks after writing or reading every single byte of data is overhead that it is better to avoid. Accordingly, modify the code to carry out EC address space accesses entirely without releasing the locks. Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
…t_to_package() [ Upstream commit a5242874488eba2b9062985bf13743c029821330 ] ACPICA commit 4d4547cf13cca820ff7e0f859ba83e1a610b9fd0 ACPI_ALLOCATE_ZEROED() may fail, elements might be NULL and will cause NULL pointer dereference later. Link: acpica/acpica@4d4547cf Signed-off-by: Pei Xiao <[email protected]> Link: https://patch.msgid.link/[email protected] [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 6555a2a9212be6983d2319d65276484f7c5f431a ] Smatch reports that copying media_name and if_name to name_parts may overwrite the destination. .../bearer.c:166 bearer_name_validate() error: strcpy() 'media_name' too large for 'name_parts->media_name' (32 vs 16) .../bearer.c:167 bearer_name_validate() error: strcpy() 'if_name' too large for 'name_parts->if_name' (1010102 vs 16) This does seem to be the case so guard against this possibility by using strscpy() and failing if truncation occurs. Introduced by commit b97bf3f ("[TIPC] Initial merge") Compile tested only. Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 91d516d4de48532d967a77967834e00c8c53dfe6 ] Increase size of queue_name buffer from 30 to 31 to accommodate the largest string written to it. This avoids truncation in the possibly unlikely case where the string is name is the maximum size. Flagged by gcc-14: .../mvpp2_main.c: In function 'mvpp2_probe': .../mvpp2_main.c:7636:32: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=] 7636 | "stats-wq-%s%s", netdev_name(priv->port_list[0]->dev), | ^ .../mvpp2_main.c:7635:9: note: 'snprintf' output between 10 and 31 bytes into a destination of size 30 7635 | snprintf(priv->queue_name, sizeof(priv->queue_name), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7636 | "stats-wq-%s%s", netdev_name(priv->port_list[0]->dev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7637 | priv->port_count > 1 ? "+" : ""); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Introduced by commit 118d629 ("net: mvpp2: add ethtool GOP statistics"). I am not flagging this as a bug as I am not aware that it is one. Compile tested only. Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Marcin Wojtas <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e3af3d3c5b26c33a7950e34e137584f6056c4319 ] dev->ip_ptr could be NULL if we set an invalid MTU. Even then, if we issue ioctl(SIOCSIFADDR) for a new IPv4 address, devinet_ioctl() allocates struct in_ifaddr and fails later in inet_set_ifa() because in_dev is NULL. Let's move the check earlier. Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 8fed54758cd248cd311a2b5c1e180abef1866237 ] The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB lookup according to user provided parameters and communicate the result back to user space. However, unlike other users of the FIB lookup API, the upper DSCP bits and the ECN bits of the DS field are not masked, which can result in the wrong result being returned. Solve this by masking the upper DSCP bits and the ECN bits using IPTOS_RT_MASK. The structure that communicates the request and the response is not exported to user space, so it is unlikely that this netlink family is actually in use [1]. [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/ Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…ocess [ Upstream commit 0d9e5df4a257afc3a471a82961ace9a22b88295a ] We found that one close-wait socket was reset by the other side due to a new connection reusing the same port which is beyond our expectation, so we have to investigate the underlying reason. The following experiment is conducted in the test environment. We limit the port range from 40000 to 40010 and delay the time to close() after receiving a fin from the active close side, which can help us easily reproduce like what happened in production. Here are three connections captured by tcpdump: 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1 127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1 // a few seconds later, within 60 seconds 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730 127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2 127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193 // later, very quickly 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1 As we can see, the first flow is reset because: 1) client starts a new connection, I mean, the second one 2) client tries to find a suitable port which is a timewait socket (its state is timewait, substate is fin_wait2) 3) client occupies that timewait port to send a SYN 4) server finds a corresponding close-wait socket in ehash table, then replies with a challenge ack 5) client sends an RST to terminate this old close-wait socket. I don't think the port selection algo can choose a FIN_WAIT2 socket when we turn on tcp_tw_reuse because on the server side there remain unread data. In some cases, if one side haven't call close() yet, we should not consider it as expendable and treat it at will. Even though, sometimes, the server isn't able to call close() as soon as possible like what we expect, it can not be terminated easily, especially due to a second unrelated connection happening. After this patch, we can see the expected failure if we start a connection when all the ports are occupied in fin_wait2 state: "Ncat: Cannot assign requested address." Reported-by: Jade Dong <[email protected]> Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a0a2459b79414584af6c46dd8c6f866d8f1aa421 ] ACPICA commit 6c551e2c9487067d4b085333e7fe97e965a11625 Link: acpica/acpica@6c551e2c Signed-off-by: Aleksandrs Vinarskis <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…md_802_11_scan_ext() [ Upstream commit 498365e52bebcbc36a93279fe7e9d6aec8479cee ] Replace one-element array with a flexible-array member in `struct host_cmd_ds_802_11_scan_ext`. With this, fix the following warning: elo 16 17:51:58 surfacebook kernel: ------------[ cut here ]------------ elo 16 17:51:58 surfacebook kernel: memcpy: detected field-spanning write (size 243) of single field "ext_scan->tlv_buffer" at drivers/net/wireless/marvell/mwifiex/scan.c:2239 (size 1) elo 16 17:51:58 surfacebook kernel: WARNING: CPU: 0 PID: 498 at drivers/net/wireless/marvell/mwifiex/scan.c:2239 mwifiex_cmd_802_11_scan_ext+0x83/0x90 [mwifiex] Reported-by: Andy Shevchenko <[email protected]> Closes: https://lore.kernel.org/linux-hardening/[email protected]/ Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Brian Norris <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://patch.msgid.link/ZsZa5xRcsLq9D+RX@elsanto Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 7f8af7bac5380f2d95a63a6f19964e22437166e1 ] These really can be handled gracefully without killing the machine. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 7b986c7430a6bb68d523dac7bfc74cbd5b44ef96 ] ASIHPI driver stores some values in the static array upon a response from the driver, and its index depends on the firmware. We shouldn't trust it blindly. This patch adds a sanity check of the array index to fit in the array size. Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c01f3815453e2d5f699ccd8c8c1f93a5b8669e59 ] The current MIDI input flush on HDSP and HDSPM drivers relies on the hardware reporting the right value. If the hardware doesn't give the proper value but returns -1, it may be stuck at an infinite loop. Add a counter and break if the loop is unexpectedly too long. Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 4a6921095eb04a900e0000da83d9475eb958e61e ] In the pxafb_probe function, it calls the pxafb_init_fbinfo function, after which &fbi->task is associated with pxafb_task. Moreover, within this pxafb_init_fbinfo function, the pxafb_blank function within the &pxafb_ops struct is capable of scheduling work. If we remove the module which will call pxafb_remove to make cleanup, it will call unregister_framebuffer function which can call do_unregister_framebuffer to free fbi->fb through put_fb_info(fb_info), while the work mentioned above will be used. The sequence of operations that may lead to a UAF bug is as follows: CPU0 CPU1 | pxafb_task pxafb_remove | unregister_framebuffer(info) | do_unregister_framebuffer(fb_info) | put_fb_info(fb_info) | // free fbi->fb | set_ctrlr_state(fbi, state) | __pxafb_lcd_power(fbi, 0) | fbi->lcd_power(on, &fbi->fb.var) | //use fbi->fb Fix it by ensuring that the work is canceled before proceeding with the cleanup in pxafb_remove. Note that only root user can remove the driver at runtime. Signed-off-by: Kaixin Wang <[email protected]> Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit cf8c39b00e982fa506b16f9d76657838c09150cb ] There may be other backup reset methods available, do not halt here so that other reset methods can be tried. Signed-off-by: Andrew Davis <[email protected]> Reviewed-by: Dhruva Gole <[email protected]> Acked-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 93b0f9e11ce511353c65b7f924cf5f95bd9c3aba ] Rename the array sil_blacklist to sil_quirks as this name is more neutral and is also consistent with how this driver define quirks with the SIL_QUIRK_XXX flags. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Niklas Cassel <[email protected]> Reviewed-by: Igor Pylypiv <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b0b2fc815e514221f01384f39fbfbff65d897e1c ] Fix issue with UBSAN throwing shift-out-of-bounds warning. Reported-by: [email protected] Signed-off-by: Remington Brasga <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234 ] [syzbot reported] ================================================================== BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:587 [inline] BUG: KASAN: slab-use-after-free in __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752 Read of size 8 at addr ffff8880229254b0 by task syz-executor357/5216 CPU: 0 UID: 0 PID: 5216 Comm: syz-executor357 Not tainted 6.11.0-rc3-syzkaller-00156-gd7a5aa4b3c00 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 __mutex_lock_common kernel/locking/mutex.c:587 [inline] __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752 dbFreeBits+0x7ea/0xd90 fs/jfs/jfs_dmap.c:2390 dbFreeDmap fs/jfs/jfs_dmap.c:2089 [inline] dbFree+0x35b/0x680 fs/jfs/jfs_dmap.c:409 dbDiscardAG+0x8a9/0xa20 fs/jfs/jfs_dmap.c:1650 jfs_ioc_trim+0x433/0x670 fs/jfs/jfs_discard.c:100 jfs_ioctl+0x2d0/0x3e0 fs/jfs/ioctl.c:131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 Freed by task 5218: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2252 [inline] slab_free mm/slub.c:4473 [inline] kfree+0x149/0x360 mm/slub.c:4594 dbUnmount+0x11d/0x190 fs/jfs/jfs_dmap.c:278 jfs_mount_rw+0x4ac/0x6a0 fs/jfs/jfs_mount.c:247 jfs_remount+0x3d1/0x6b0 fs/jfs/super.c:454 reconfigure_super+0x445/0x880 fs/super.c:1083 vfs_cmd_reconfigure fs/fsopen.c:263 [inline] vfs_fsconfig_locked fs/fsopen.c:292 [inline] __do_sys_fsconfig fs/fsopen.c:473 [inline] __se_sys_fsconfig+0xb6e/0xf80 fs/fsopen.c:345 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Analysis] There are two paths (dbUnmount and jfs_ioc_trim) that generate race condition when accessing bmap, which leads to the occurrence of uaf. Use the lock s_umount to synchronize them, in order to avoid uaf caused by race condition. Reported-and-tested-by: [email protected] Signed-off-by: Edward Adam Davis <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d64ff0d2306713ff084d4b09f84ed1a8c75ecc32 ] syzbot report a out of bounds in dbSplit, it because dmt_leafidx greater than num leaves per dmap tree, add a checking for dmt_leafidx in dbFindLeaf. Shaggy: Modified sanity check to apply to control pages as well as leaf pages. Reported-and-tested-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=dca05492eff41f604890 Signed-off-by: Edward Adam Davis <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 2b59ffad47db1c46af25ccad157bb3b25147c35c ] syzbot reports that lzo1x_1_do_compress is using uninit-value: ===================================================== BUG: KMSAN: uninit-value in lzo1x_1_do_compress+0x19f9/0x2510 lib/lzo/lzo1x_compress.c:178 ... Uninit was stored to memory at: ea_put fs/jfs/xattr.c:639 [inline] ... Local variable ea_buf created at: __jfs_setxattr+0x5d/0x1ae0 fs/jfs/xattr.c:662 __jfs_xattr_set+0xe6/0x1f0 fs/jfs/xattr.c:934 ===================================================== The reason is ea_buf->new_ea is not initialized properly. Fix this by using memset to empty its content at the beginning in ea_get(). Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=02341e0daa42a15ce130 Signed-off-by: Zhao Mengmeng <[email protected]> Signed-off-by: Dave Kleikamp <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 35ff747c86767937ee1e0ca987545b7eed7a0810 ] [WHAT & HOW] amdgpu_dm can pass a null stream to dc_is_stream_unchanged. It is necessary to check for null before dereferencing them. This fixes 1 FORWARD_NULL issue reported by Coverity. Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Jerry Zuo <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…ranslation [ Upstream commit b7e99058eb2e86aabd7a10761e76cae33d22b49f ] Fixes index out of bounds issue in `cm_helper_translate_curve_to_degamma_hw_format` function. The issue could occur when the index 'i' exceeds the number of transfer function points (TRANSFER_FUNC_POINTS). The fix adds a check to ensure 'i' is within bounds before accessing the transfer function points. If 'i' is out of bounds the function returns false to indicate an error. Reported by smatch: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:594 cm_helper_translate_curve_to_degamma_hw_format() error: buffer overflow 'output_tf->tf_pts.red' 1025 <= s32max drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:595 cm_helper_translate_curve_to_degamma_hw_format() error: buffer overflow 'output_tf->tf_pts.green' 1025 <= s32max drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:596 cm_helper_translate_curve_to_degamma_hw_format() error: buffer overflow 'output_tf->tf_pts.blue' 1025 <= s32max Cc: Tom Chung <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Roman Li <[email protected]> Cc: Alex Hung <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Hamza Mahfooz <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Tom Chung <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 53369581dc0c68a5700ed51e1660f44c4b2bb524 ] We want to determine the size of the devcoredump before writing it out. To that end, we will run the devcoredump printer with NULL data to get the size, alloc data based on the generated offset, then run the devcorecump again with a valid data pointer to print. This necessitates not writing data to the data pointer on the initial pass, when it is NULL. v5: - Better commit message (Jonathan) - Add kerenl doc with examples (Jani) Cc: Maarten Lankhorst <[email protected]> Acked-by: Maarten Lankhorst <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Jonathan Cavitt <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 6e5860b0ad4934baee8c7a202c02033b2631bb44 ] struct aac_srb_unit contains struct aac_srb, which contains struct sgmap, which ends in a (currently) "fake" (1-element) flexible array. Converting this to a flexible array is needed so that runtime bounds checking won't think the array is fixed size (i.e. under CONFIG_FORTIFY_SOURCE=y and/or CONFIG_UBSAN_BOUNDS=y), as other parts of aacraid use struct sgmap as a flexible array. It is not legal to have a flexible array in the middle of a structure, so it either needs to be split up or rearranged so that it is at the end of the structure. Luckily, struct aac_srb_unit, which is exclusively consumed/updated by aac_send_safw_bmic_cmd(), does not depend on member ordering. The values set in the on-stack struct aac_srb_unit instance "srbu" by the only two callers, aac_issue_safw_bmic_identify() and aac_get_safw_ciss_luns(), do not contain anything in srbu.srb.sgmap.sg, and they both implicitly initialize srbu.srb.sgmap.count to 0 during memset(). For example: memset(&srbu, 0, sizeof(struct aac_srb_unit)); srbcmd = &srbu.srb; srbcmd->flags = cpu_to_le32(SRB_DataIn); srbcmd->cdb[0] = CISS_REPORT_PHYSICAL_LUNS; srbcmd->cdb[1] = 2; /* extended reporting */ srbcmd->cdb[8] = (u8)(datasize >> 8); srbcmd->cdb[9] = (u8)(datasize); rcode = aac_send_safw_bmic_cmd(dev, &srbu, phys_luns, datasize); During aac_send_safw_bmic_cmd(), a separate srb is mapped into DMA, and has srbu.srb copied into it: srb = fib_data(fibptr); memcpy(srb, &srbu->srb, sizeof(struct aac_srb)); Only then is srb.sgmap.count written and srb->sg populated: srb->count = cpu_to_le32(xfer_len); sg64 = (struct sgmap64 *)&srb->sg; sg64->count = cpu_to_le32(1); sg64->sg[0].addr[1] = cpu_to_le32(upper_32_bits(addr)); sg64->sg[0].addr[0] = cpu_to_le32(lower_32_bits(addr)); sg64->sg[0].count = cpu_to_le32(xfer_len); But this is happening in the DMA memory, not in srbu.srb. An attempt to copy the changes back to srbu does happen: /* * Copy the updated data for other dumping or other usage if * needed */ memcpy(&srbu->srb, srb, sizeof(struct aac_srb)); But this was never correct: the sg64 (3 u32s) overlap of srb.sg (2 u32s) always meant that srbu.srb would have held truncated information and any attempt to walk srbu.srb.sg.sg based on the value of srbu.srb.sg.count would result in attempting to parse past the end of srbu.srb.sg.sg[0] into srbu.srb_reply. After getting a reply from hardware, the reply is copied into srbu.srb_reply: srb_reply = (struct aac_srb_reply *)fib_data(fibptr); memcpy(&srbu->srb_reply, srb_reply, sizeof(struct aac_srb_reply)); This has always been fixed-size, so there's no issue here. It is worth noting that the two callers _never check_ srbu contents -- neither srbu.srb nor srbu.srb_reply is examined. (They depend on the mapped xfer_buf instead.) Therefore, the ordering of members in struct aac_srb_unit does not matter, and the flexible array member can moved to the end. (Additionally, the two memcpy()s that update srbu could be entirely removed as they are never consumed, but I left that as-is.) Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c6dbab46324b1742b50dc2fb5c1fee2c28129439 ] With -Werror: In function ‘r100_cp_init_microcode’, inlined from ‘r100_cp_init’ at drivers/gpu/drm/radeon/r100.c:1136:7: include/linux/printk.h:465:44: error: ‘%s’ directive argument is null [-Werror=format-overflow=] 465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ^ include/linux/printk.h:437:17: note: in definition of macro ‘printk_index_wrap’ 437 | _p_func(_fmt, ##__VA_ARGS__); \ | ^~~~~~~ include/linux/printk.h:508:9: note: in expansion of macro ‘printk’ 508 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~ drivers/gpu/drm/radeon/r100.c:1062:17: note: in expansion of macro ‘pr_err’ 1062 | pr_err("radeon_cp: Failed to load firmware \"%s\"\n", fw_name); | ^~~~~~ Fix this by converting the if/else if/... construct into a proper switch() statement with a default to handle the error case. As a bonus, the generated code is ca. 100 bytes smaller (with gcc 11.4.0 targeting arm32). Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 39ab331ab5d377a18fbf5a0e0b228205edfcc7f4 ] Replace two open-coded calculations of the buffer size by invocations of sizeof() on the buffer itself, to make sure the code will always use the actual buffer size. Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/817c0b9626fd30790fc488c472a3398324cfcc0c.1724156125.git.geert+renesas@glider.be Signed-off-by: Rob Herring (Arm) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
zone_watermark_fast was introduced by commit 48ee5f3 ("mm, page_alloc: shortcut watermark checks for order-0 pages"). The commit simply checks if free pages is bigger than watermark without additional calculation such like reducing watermark. It considered free cma pages but it did not consider highatomic reserved. This may incur exhaustion of free pages except high order atomic free pages. Assume that reserved_highatomic pageblock is bigger than watermark min, and there are only few free pages except high order atomic free. Because zone_watermark_fast passes the allocation without considering high order atomic free, normal reclaimable allocation like GFP_HIGHUSER will consume all the free pages. Then finally order-0 atomic allocation may fail on allocation. This means watermark min is not protected against non-atomic allocation. The order-0 atomic allocation with ALLOC_HARDER unwantedly can be failed. Additionally the __GFP_MEMALLOC allocation with ALLOC_NO_WATERMARKS also can be failed. To avoid the problem, zone_watermark_fast should consider highatomic reserve. If the actual size of high atomic free is counted accurately like cma free, we may use it. On this patch just use nr_reserved_highatomic. Additionally introduce __zone_watermark_unusable_free to factor out common parts between zone_watermark_fast and __zone_watermark_ok. This is an example of ALLOC_HARDER allocation failure using v4.19 based kernel. Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) Call trace: [<ffffff8008f40f8c>] dump_stack+0xb8/0xf0 [<ffffff8008223320>] warn_alloc+0xd8/0x12c [<ffffff80082245e4>] __alloc_pages_nodemask+0x120c/0x1250 [<ffffff800827f6e8>] new_slab+0x128/0x604 [<ffffff800827b0cc>] ___slab_alloc+0x508/0x670 [<ffffff800827ba00>] __kmalloc+0x2f8/0x310 [<ffffff80084ac3e0>] context_struct_to_string+0x104/0x1cc [<ffffff80084ad8fc>] security_sid_to_context_core+0x74/0x144 [<ffffff80084ad880>] security_sid_to_context+0x10/0x18 [<ffffff800849bd80>] selinux_secid_to_secctx+0x20/0x28 [<ffffff800849109c>] security_secid_to_secctx+0x3c/0x70 [<ffffff8008bfe118>] binder_transaction+0xe68/0x454c Mem-Info: active_anon:102061 inactive_anon:81551 isolated_anon:0 active_file:59102 inactive_file:68924 isolated_file:64 unevictable:611 dirty:63 writeback:0 unstable:0 slab_reclaimable:13324 slab_unreclaimable:44354 mapped:83015 shmem:4858 pagetables:26316 bounce:0 free:2727 free_pcp:1035 free_cma:178 Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB lowmem_reserve[]: 0 0 Normal: 505*4kB (H) 357*8kB (H) 201*16kB (H) 65*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10236kB 138826 total pagecache pages 5460 pages in swap cache Swap cache stats: add 8273090, delete 8267506, find 1004381/4060142 This is an example of ALLOC_NO_WATERMARKS allocation failure using v4.14 based kernel. kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null) kswapd0 cpuset=/ mems_allowed=0 CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug MiCode#1 Call trace: [<0000000000000000>] dump_backtrace+0x0/0x248 [<0000000000000000>] show_stack+0x18/0x20 [<0000000000000000>] __dump_stack+0x20/0x28 [<0000000000000000>] dump_stack+0x68/0x90 [<0000000000000000>] warn_alloc+0x104/0x198 [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0 [<0000000000000000>] zs_malloc+0x148/0x3d0 [<0000000000000000>] zram_bvec_rw+0x410/0x798 [<0000000000000000>] zram_rw_page+0x88/0xdc [<0000000000000000>] bdev_write_page+0x70/0xbc [<0000000000000000>] __swap_writepage+0x58/0x37c [<0000000000000000>] swap_writepage+0x40/0x4c [<0000000000000000>] shrink_page_list+0xc30/0xf48 [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c [<0000000000000000>] shrink_node_memcg+0x23c/0x618 [<0000000000000000>] shrink_node+0x1c8/0x304 [<0000000000000000>] kswapd+0x680/0x7c4 [<0000000000000000>] kthread+0x110/0x120 [<0000000000000000>] ret_from_fork+0x10/0x18 Mem-Info: active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0 Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB [ 4738.329607] lowmem_reserve[]: 0 0 Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB This is trace log which shows GFP_HIGHUSER consumes free pages right before ALLOC_NO_WATERMARKS. <...>-22275 [006] .... 889.213383: mm_page_alloc: page=00000000d2be5665 pfn=970744 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213385: mm_page_alloc: page=000000004b2335c2 pfn=970745 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213387: mm_page_alloc: page=00000000017272e1 pfn=970278 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213389: mm_page_alloc: page=00000000c4be79fb pfn=970279 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213391: mm_page_alloc: page=00000000f8a51d4f pfn=970260 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213393: mm_page_alloc: page=000000006ba8f5ac pfn=970261 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213395: mm_page_alloc: page=00000000819f1cd3 pfn=970196 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213396: mm_page_alloc: page=00000000f6b72a64 pfn=970197 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO kswapd0-1207 [005] ...1 889.213398: mm_page_alloc: page= (null) pfn=0 order=0 migratetype=1 nr_free=3650 gfp_flags=GFP_NOWAIT|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_MOVABLE [[email protected]: remove redundant code for high-order] Link: http://lkml.kernel.org/r/[email protected] Reported-by: Yong-Taek Lee <[email protected]> Suggested-by: Minchan Kim <[email protected]> Signed-off-by: Jaewon Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Yong-Taek Lee <[email protected]> Cc: Michal Hocko <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Juhyung Park <[email protected]> Change-Id: I23e8131f50f5029920f9f27ec0bea3cbf87b8a40
Speculative page fault checks pmd to be valid before starting to handle the page fault and pte_alloc() should do nothing if pmd stays valid. If pmd gets changed during speculative page fault, we will detect the change later and retry with mmap_lock. Therefore pte_alloc() can be safely skipped and this prevents the racy pmd_lock() call which can access pmd->ptl after pmd was cleared. Bug: 257443051 Change-Id: Iec57df5530dba6e0e0bdf9f7500f910851c3d3fd Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: 1169f70 Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <[email protected]>
…age() do_swap_page() uses migration_entry_wait() which operates on page tables without protection. Disable speculative page fault handling. Bug: 257443051 Change-Id: I677eb1ee85707dce533d5d811dcde5f5dabcfdf3 Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: 4b38875 Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <[email protected]>
Checks of pmd during speculative page fault handling are racy because pmd is unprotected and might be modified or cleared. This might cause use-after-free reads from speculative path, therefore prevent such checks. At the beginning of speculation pmd is checked to be valid and if it's changed before page fault is handled, the change will be detected and page fault will be retried under mmap_lock protection. Bug: 257443051 Change-Id: I0cbd3b0b44e8296cf0d6cb298fae48c696580068 Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: 2bb39b9 Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
…lt() Extend filemap_fault() to handle speculative faults. In the speculative case, we will only be fishing existing pages out of the page cache. The logic we use mirrors what is done in the non-speculative case, assuming that pages are found in the page cache, are up to date and not already locked, and that readahead is not necessary at this time. In all other cases, the fault is aborted to be handled non-speculatively. Signed-off-by: Michel Lespinasse <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Conflicts: mm/filemap.c 1. Added back file_ra_state variable used by SPF path. 2. Updated comment for filemap_fault to reflect SPF locking rules. Bug: 161210518 Signed-off-by: Suren Baghdasaryan <[email protected]> Change-Id: I82eba7fcfc81876245c2e65bc5ae3d33ddfcc368 Git-commit: 59d4d12 Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
…mping commit 59ea6d0 upstream. When fixing the race conditions between the coredump and the mmap_sem holders outside the context of the process, we focused on mmget_not_zero()/get_task_mm() callers in 04f5866 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping"), but those aren't the only cases where the mmap_sem can be taken outside of the context of the process as Michal Hocko noticed while backporting that commit to older -stable kernels. If mmgrab() is called in the context of the process, but then the mm_count reference is transferred outside the context of the process, that can also be a problem if the mmap_sem has to be taken for writing through that mm_count reference. khugepaged registration calls mmgrab() in the context of the process, but the mmap_sem for writing is taken later in the context of the khugepaged kernel thread. collapse_huge_page() after taking the mmap_sem for writing doesn't modify any vma, so it's not obvious that it could cause a problem to the coredump, but it happens to modify the pmd in a way that breaks an invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page() needs the mmap_sem for writing just to block concurrent page faults that call pmd_trans_huge_lock(). Specifically the invariant that "!pmd_trans_huge()" cannot become a "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs. The coredump will call __get_user_pages() without mmap_sem for reading, which eventually can invoke a lockless page fault which will need a functional pmd_trans_huge_lock(). So collapse_huge_page() needs to use mmget_still_valid() to check it's not running concurrently with the coredump... as long as the coredump can invoke page faults without holding the mmap_sem for reading. This has "Fixes: khugepaged" to facilitate backporting, but in my view it's more a bug in the coredump code that will eventually have to be rewritten to stop invoking page faults without the mmap_sem for reading. So the long term plan is still to drop all mmget_still_valid(). Link: http://lkml.kernel.org/r/[email protected] Fixes: ba76149 ("thp: khugepaged") Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Jann Horn <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Speculative page fault handler needs to detect concurrent pmd changes and relies on vma seqcount for that. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a pmd. vm_write_{begin|end} are needed in the paths which can call these functions for page fault handler to detect pmd changes. Bug: 257443051 Change-Id: Ieb784b5f44901b66a594f61b9e7c91190ff97f80 Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: 5ed391b Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
…ly owned In a number of cases vm_write_{begin|end} is called while mmap_lock is not owned exclusively. This is unnecessary and can affect correctness of the sequence counting protecting speculative page fault handlers. Remove extra calls. Bug: 257443051 Change-Id: I1278638a0794448e22fbdab5601212b3b2eaebdc Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: bfdcf47 Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
Invalid condition was introduced when porting the original SPF patch which would affect NUMA mode. Fixes: 736ae8b ("FROMLIST: mm: adding speculative page fault failure trace events") Bug: 257443051 Change-Id: Ib20c625615b279dc467588933a1f598dc179861b Signed-off-by: Suren Baghdasaryan <[email protected]> Git-commit: 1900436 Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
SPF attempts page faults without taking the mmap lock, but takes the PTL. If there is a concurrent fast mremap (at PMD/PUD level), this can lead to a UAF as fast mremap will only take the PTL locks at the PMD/PUD level. SPF cannot take the PTL locks at the larger subtree granularity since this introduces much contention in the page fault paths. To address the race: 1) Only try fast mremaps if there are no users of the VMA. Android is concerned with this optimization in the context of GC stop-the-world pause. So there are no other threads active and this should almost always succeed. 2) Speculative faults detect ongoing fast mremaps and fallback to conventional fault handling (taking mmap read lock). Bug: 263177905 Change-Id: I23917e493ddc8576de19883cac053dfde9982b7f Signed-off-by: Kalesh Singh <[email protected]> Git-commit: 529351c Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolve merge conflicts. Not applying mremap changes due to absence of applicable configs for race.] Signed-off-by: Srinivasarao Pathipati <[email protected]>
find_get_page() returns a page with increased refcount, assuming a page exists at the given index. Ensure this refcount is dropped on error. Bug: 271079833 Fixes: 59d4d12 ("BACKPORT: FROMLIST: mm: implement speculative handling in filemap_fault()") Change-Id: Idc7b9e3f11f32a02bed4c6f4e11cec9200a5c790 Signed-off-by: Patrick Daly <[email protected]> (cherry picked from commit 6232eec) Signed-off-by: Zhenhua Huang <[email protected]> Git-commit: 1d05213 Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <[email protected]>
There was a report that a task is waiting at the throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was increasing. This is a bug where zone_watermark_fast returns true even when the free is very low. The commit f27ce0e ("page_alloc: consider highatomic reserve in watermark fast") changed the watermark fast to consider highatomic reserve. But it did not handle a negative value case which can be happened when reserved_highatomic pageblock is bigger than the actual free. If watermark is considered as ok for the negative value, allocating contexts for order-0 will consume all free pages without direct reclaim, and finally free page may become depleted except highatomic free. Then allocating contexts may fall into throttle_direct_reclaim. This symptom may easily happen in a system where wmark min is low and other reclaimers like kswapd does not make free pages quickly. Handle the negative case by using MIN. Link: https://lkml.kernel.org/r/[email protected] Fixes: 936ae5b ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: Jaewon Kim <[email protected]> Reported-by: GyeongHwan Hong <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yong-Taek Lee <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Juhyung Park <[email protected]> Change-Id: I5ac94eb7703248dafc4c9c43dad3d9010c51b0bf
…ck in exit_mmap oom-reaper and process_mrelease system call should protect against races with exit_mmap which can destroy page tables while they walk the VMA tree. oom-reaper protects from that race by setting MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP before taking and releasing mmap_write_lock. process_mrelease has to elevate mm->mm_users to prevent such race. Both oom-reaper and process_mrelease hold mmap_read_lock when walking the VMA tree. The locking rules and mechanisms could be simpler if exit_mmap takes mmap_write_lock while executing destructive operations such as free_pgtables. Change exit_mmap to hold the mmap_write_lock when calling free_pgtables. Operations like unmap_vmas() and unlock_range() are not destructive and could run under mmap_read_lock but for simplicity we take one mmap_write_lock during almost the entire operation. Note also that because oom-reaper checks VM_LOCKED flag, unlock_range() should not be allowed to race with it. In most cases this lock should be uncontended. Previously, Kirill reported ~4% regression caused by a similar change [1]. We reran the same test and although the individual results are quite noisy, the percentiles show lower regression with 1.6% being the worst case [2]. The change allows oom-reaper and process_mrelease to execute safely under mmap_read_lock without worries that exit_mmap might destroy page tables from under them. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/ Signed-off-by: Suren Baghdasaryan <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Bug: 130172058 Bug: 189803002 Change-Id: Ic87272d09a0b68a1b0e968e8f1a1510fd6fc776a Git-commit: 28358eb Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: Resolved cherry-pick conflict in mm/mmap.c due to mmap lock was implemented differently in older kernel, and Although process_mrelease is not applicable in older kernel, but this patch is required to take exclusive lock in exit_mmap path so that SPF knows an isolated vma was freed from this path] Signed-off-by: Gaurav Kohli <[email protected]> Signed-off-by: Srinivasarao Pathipati <[email protected]>
Add a count of the number of RCU users (currently 1) of the task struct so that we can later add the scheduler case and get rid of the very subtle task_rcu_dereference(), and just use rcu_dereference(). As suggested by Oleg have the count overlap rcu_head so that no additional space in task_struct is required. Change-Id: Ib1f00439f5e119cce4af2bf712df5a60b47fa81f Inspired-by: Linus Torvalds <[email protected]> Inspired-by: Oleg Nesterov <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King - ARM Linux admin <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Git-commit: 3fbd7ee Git-repo: https://android.googlesource.com/kernel/common/ [[email protected]: resolved trivial merge conflicts] Signed-off-by: Srinivasarao Pathipati <[email protected]>
…r leaving the runqueue In the ordinary case today the RCU grace period for a task_struct is triggered when another process wait's for it's zombine and causes the kernel to call release_task(). As the waiting task has to receive a signal and then act upon it before this happens, typically this will occur after the original task as been removed from the runqueue. Unfortunaty in some cases such as self reaping tasks it can be shown that release_task() will be called starting the grace period for task_struct long before the task leaves the runqueue. Therefore use put_task_struct_rcu_user() in finish_task_switch() to guarantee that the there is a RCU lifetime after the task leaves the runqueue. Besides the change in the start of the RCU grace period for the task_struct this change may cause perf_event_delayed_put and trace_sched_process_free. The function perf_event_delayed_put boils down to just a WARN_ON for cases that I assume never show happen. So I don't see any problem with delaying it. The function trace_sched_process_free is a trace point and thus visible to user space. Occassionally userspace has the strangest dependencies so this has a miniscule chance of causing a regression. This change only changes the timing of when the tracepoint is called. The change in timing arguably gives userspace a more accurate picture of what is going on. So I don't expect there to be a regression. In the case where a task self reaps we are pretty much guaranteed that the RCU grace period is delayed. So we should get quite a bit of coverage in of this worst case for the change in a normal threaded workload. So I expect any issues to turn up quickly or not at all. I have lightly tested this change and everything appears to work fine. Change-Id: I8b3d0891c3a68e549c10e7e10bfc814f82138a5d Inspired-by: Linus Torvalds <[email protected]> Inspired-by: Oleg Nesterov <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King - ARM Linux admin <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Git-commit: 0ff7b2c Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <[email protected]>
The current code sets per-cpu variable sd_asym_cpucapacity while building sched domains even when there are no asymmetric CPUs. This is done to make sure that EAS remains enabled on a b.L system after hotplugging out all big/LITTLE CPUs. However it is causing the below warning during CPU hotplug. [13988.932604] pc : static_key_slow_dec_cpuslocked+0xe8/0x150 [13988.932608] lr : static_key_slow_dec_cpuslocked+0xe8/0x150 [13988.932610] sp : ffffffc010333c00 [13988.932612] x29: ffffffc010333c00 x28: ffffff8138d88088 [13988.932615] x27: 0000000000000000 x26: 0000000000000081 [13988.932618] x25: ffffff80917efc80 x24: ffffffc010333c60 [13988.932621] x23: ffffffd32bf09c58 x22: 0000000000000000 [13988.932623] x21: 0000000000000000 x20: ffffff80917efc80 [13988.932626] x19: ffffffd32bf0a3e0 x18: ffffff8138039c38 [13988.932628] x17: ffffffd32bf2b000 x16: 0000000000000050 [13988.932631] x15: 0000000000000050 x14: 0000000000040000 [13988.932633] x13: 0000000000000178 x12: 0000000000000001 [13988.932636] x11: 16a9ca5426841300 x10: 16a9ca5426841300 [13988.932639] x9 : 16a9ca5426841300 x8 : 16a9ca5426841300 [13988.932641] x7 : 0000000000000000 x6 : ffffff813f4edadb [13988.932643] x5 : 0000000000000000 x4 : 0000000000000004 [13988.932646] x3 : ffffffc010333880 x2 : ffffffd32a683a2c [13988.932648] x1 : ffffffd329355498 x0 : 000000000000001b [13988.932651] Call trace: [13988.932656] static_key_slow_dec_cpuslocked+0xe8/0x150 [13988.932660] partition_sched_domains_locked+0x1f8/0x80c [13988.932666] sched_cpu_deactivate+0x9c/0x13c [13988.932670] cpuhp_invoke_callback+0x6ac/0xa8c [13988.932675] cpuhp_thread_fun+0x158/0x1ac [13988.932678] smpboot_thread_fn+0x244/0x3e4 [13988.932681] kthread+0x168/0x178 [13988.932685] ret_from_fork+0x10/0x18 The mismatch between increment/decrement of sched_asym_cpucapacity static key is resulting in the above warning. It is due to the fact that the increment happens only when the system really has asymmetric capacity CPUs. This check is done by going through each CPU capacity. So when system becomes SMP during hotplug, the increment never happens. However the decrement of this static key is done when any of the currently online CPU has per-cpu variable sd_asym_cpucapacity value as non-NULL. Since we always set this variable, we run into this issue. Our goal was to enable EAS on SMP. To achieve that enable EAS and build perf domains (required for compute energy) irrespective of per-cpu variable sd_asym_cpucapacity value. In this way we no longer have to enable sched_asym_cpucapacity feature on SMP to enable EAS. Change-Id: Id46f2b80350b742c75195ad6939b814d4695eb07 Signed-off-by: Pavankumar Kondeti <[email protected]>
SCHED_FIFO (or any static priority scheduler) is a broken scheduler model; it is fundamentally incapable of resource management, the one thing an OS is actually supposed to do. It is impossible to compose static priority workloads. One cannot take two well designed and functional static priority workloads and mash them together and still expect them to work. Therefore it doesn't make sense to expose the priority field; the kernel is fundamentally incapable of setting a sensible value, it needs systems knowledge that it doesn't have. Take away sched_setschedule() / sched_setattr() from modules and replace them with: - sched_set_fifo(p); create a FIFO task (at prio 50) - sched_set_fifo_low(p); create a task higher than NORMAL, which ends up being a FIFO task at prio 1. - sched_set_normal(p, nice); (re)set the task to normal This stops the proliferation of randomly chosen, and irrelevant, FIFO priorities that dont't really mean anything anyway. The system administrator/integrator, whoever has insight into the actual system design and requirements (userspace) can set-up appropriate priorities if and when needed. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Change-Id: I52ed4f1253e82ba3e8f40f3aa1aff62580163f25 Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Tested-by: Paul E. McKenney <[email protected]>
- This is still widely used on init script, custom recovery or flashable script. - From: https://github.com/forforksake/android_kernel_samsung_a146p Signed-off-by: zainarbani <[email protected]>
Signed-off-by: zainarbani <[email protected]>
Signed-off-by: zainarbani <[email protected]>
- From fleur-s-oss Signed-off-by: zainarbani <[email protected]>
…version in MIUI idea taken from 'https://github.com/johnmart19/Moonbase_xiaomi_ginkgo' commit hash 'c8a07bd6ab289724b9f43d1cfd68e5eb072e8f14' Signed-off-by: Shaquib Imdad <[email protected]> Signed-off-by: Pranav Vashi <[email protected]>
Change-Id: Ic91e97f9c7ec073db3c4b8514dfbc95302de4ac8
This reduces the length of the git commit hash embedded in the kernel version fron 12 characters to 8 characters. Before: 4.19.217-perf-gd503a36d7582 After: 4.19.217-perf-g47057aaa Change-Id: I5580b224d5ebf476c41b8b3ddb64290838998b7e
I ofter push after the build is done and I hate seeing "-dirty" Change-Id: Icba706cbea52352b5d4efc196361303bcd58d894
- From sea-t-oss & ruby-s-oss Signed-off-by: zainarbani <[email protected]>
Not sure if interrupt was even used, lets just disabled it for now. refs: https://github.com/MotorolaMobilityLLC/kernel-mtk/blob/MMI-S3RWS32.138-29-5-11/arch/arm64/boot/dts/mediatek/mt6768-corfu-common-overlay.dtsi#L190 MotorolaMobilityLLC/motorola-kernel-modules@e2c498e Kernel log: 7,4797,1638088,-; (0)[1:swapper/0]generic pinconfig core: found bias-pull-up with value 11 7,4798,1638098,-; (0)[1:swapper/0]generic pinconfig core: found slew-rate with value 0 7,4799,1638111,-; (0)[1:swapper/0]pinctrl core: add 2 pinctrl maps 7,4800,1638118,-; (0)[1:swapper/0]generic pinconfig core: found bias-pull-up with value 11 7,4801,1638126,-; (0)[1:swapper/0]generic pinconfig core: found slew-rate with value 0 7,4802,1638137,-; (0)[1:swapper/0]pinctrl core: add 2 pinctrl maps 7,4803,1638160,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: found group selector 12 for GPIO12 SUBSYSTEM=platform DEVICE=+platform:pinctrl 7,4804,1638167,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: found group selector 12 for GPIO12 SUBSYSTEM=platform DEVICE=+platform:pinctrl 7,4805,1638174,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: found group selector 12 for GPIO12 SUBSYSTEM=platform DEVICE=+platform:pinctrl 7,4806,1638180,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: found group selector 12 for GPIO12 SUBSYSTEM=platform DEVICE=+platform:pinctrl 7,4807,1638189,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: request pin 12 (GPIO12) for 7-0065 SUBSYSTEM=platform DEVICE=+platform:pinctrl 3,4808,1638200,-; (0)[1:swapper/0]mt6785-pinctrl pinctrl: pin_config_group_set op failed for group 12 SUBSYSTEM=platform DEVICE=+platform:pinctrl 3,4809,1638209,-; (0)[1:swapper/0]bq2597x_charger 7-0065: Error applying setting, reverse things back SUBSYSTEM=i2c DEVICE=+i2c:7-0065 4,4810,1638228,-; (0)[1:swapper/0]bq2597x_charger: probe of 7-0065 failed with error -22 Signed-off-by: zainarbani <[email protected]>
Signed-off-by: zainarbani <[email protected]>
Signed-off-by: zainarbani <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.