Add support for AR5BBU22 [0489:e03c] #18

reejk · 2012-05-11T12:30:15Z

No description provided.

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8de torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 #10 [d72d3db4] try_to_compact_pages at c030bc84 #11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 #12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 #13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 #14 [d72d3eb8] alloc_pages_vma at c030a845 #15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb #16 [d72d3f00] handle_mm_fault at c02f36c6 #17 [d72d3f30] do_page_fault at c05c70ed #18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e #6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e #7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 #9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 #10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 #11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 #12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] #13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] #14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] #15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] #16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] #17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] #18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca #19 [ffff88021f007e68] worker_thread at ffffffff81060513 #20 [ffff88021f007ee8] kthread at ffffffff810648b6 #21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>

Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... #31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok. Booting Node 1, Processors #8 #9 #10 #11 #12 #13 #14 #15 Ok. Booting Node 0, Processors #16 #17 #18 #19 #20 #21 #22 #23 Ok. Booting Node 1, Processors #24 #25 #26 #27 #28 #29 #30 #31 Brought up 32 CPUs Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>

This patch simplifies the buffer allocation functions for MCI and removes unneeded memset calls. Also, a couple of unused variables are removed and a memory leak in DMA allocation is fixed. [ 1263.788267] WARNING: at /home/sujith/dev/wireless-testing/lib/dma-debug.c:875 check_unmap+0x173/0x7e0() [ 1263.788273] ath9k 0000:06:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000071908000] [map size=512 bytes] [unmap size=256 bytes] [ 1263.788345] Pid: 774, comm: rmmod Tainted: G W O 3.3.0-rc3-wl #18 [ 1263.788348] Call Trace: [ 1263.788355] [<ffffffff8105110f>] warn_slowpath_common+0x7f/0xc0 [ 1263.788359] [<ffffffff81051206>] warn_slowpath_fmt+0x46/0x50 [ 1263.788363] [<ffffffff8125a713>] check_unmap+0x173/0x7e0 [ 1263.788368] [<ffffffff8123fc22>] ? prio_tree_left+0x32/0xc0 [ 1263.788373] [<ffffffff8125aded>] debug_dma_free_coherent+0x6d/0x80 [ 1263.788381] [<ffffffffa0701c3c>] ath_mci_cleanup+0x7c/0x110 [ath9k] [ 1263.788387] [<ffffffffa06f4033>] ath9k_deinit_softc+0x113/0x120 [ath9k] [ 1263.788392] [<ffffffffa06f55cc>] ath9k_deinit_device+0x5c/0x70 [ath9k] [ 1263.788397] [<ffffffffa0704934>] ath_pci_remove+0x54/0xa0 [ath9k] [ 1263.788401] [<ffffffff81267d06>] pci_device_remove+0x46/0x110 [ 1263.788406] [<ffffffff813102bc>] __device_release_driver+0x7c/0xe0 [ 1263.788410] [<ffffffff81310a00>] driver_detach+0xd0/0xe0 [ 1263.788414] [<ffffffff81310118>] bus_remove_driver+0x88/0xe0 [ 1263.788418] [<ffffffff813111c2>] driver_unregister+0x62/0xa0 [ 1263.788421] [<ffffffff812680c4>] pci_unregister_driver+0x44/0xc0 [ 1263.788427] [<ffffffffa0705015>] ath_pci_exit+0x15/0x20 [ath9k] [ 1263.788432] [<ffffffffa070a92d>] ath9k_exit+0x15/0x31 [ath9k] [ 1263.788436] [<ffffffff810b971c>] sys_delete_module+0x18c/0x270 [ 1263.788441] [<ffffffff81436edd>] ? retint_swapgs+0x13/0x1b [ 1263.788446] [<ffffffff812483be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1263.788450] [<ffffffff814378e9>] system_call_fastpath+0x16/0x1b [ 1263.788453] ---[ end trace 3ab4d030ffde40d4 ]--- Signed-off-by: Sujith Manoharan <[email protected]> Signed-off-by: John W. Linville <[email protected]>

commit 87121ca upstream. Oprofile may crash in a KVM guest while unlaoding modules. This happens if oprofile_arch_init() fails and oprofile switches to the hr timer mode as a fallback. In this case oprofile_arch_exit() is called, but it never was initialized properly which causes the crash. This patch fixes this. oprofile: using timer interrupt. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 PGD 41da3f067 PUD 41d80e067 PMD 0 Oops: 0002 [#1] PREEMPT SMP CPU 5 Modules linked in: oprofile(-) Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff8123c226>] [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP: 0018:ffff88041de1de98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200 RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620 RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082 R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080 R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210 FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040) Stack: ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200 Call Trace: [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile] [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile] [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b RIP [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP <ffff88041de1de98> CR2: 0000000000000008 ---[ end trace 06d4e95b6aa3b437 ]--- Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 87121ca upstream. Oprofile may crash in a KVM guest while unlaoding modules. This happens if oprofile_arch_init() fails and oprofile switches to the hr timer mode as a fallback. In this case oprofile_arch_exit() is called, but it never was initialized properly which causes the crash. This patch fixes this. oprofile: using timer interrupt. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 PGD 41da3f067 PUD 41d80e067 PMD 0 Oops: 0002 [#1] PREEMPT SMP CPU 5 Modules linked in: oprofile(-) Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff8123c226>] [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP: 0018:ffff88041de1de98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200 RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620 RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082 R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080 R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210 FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040) Stack: ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200 Call Trace: [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile] [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile] [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b RIP [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP <ffff88041de1de98> CR2: 0000000000000008 ---[ end trace 06d4e95b6aa3b437 ]--- Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8de torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Oprofile may crash in a KVM guest while unlaoding modules. This happens if oprofile_arch_init() fails and oprofile switches to the hr timer mode as a fallback. In this case oprofile_arch_exit() is called, but it never was initialized properly which causes the crash. This patch fixes this. oprofile: using timer interrupt. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 PGD 41da3f067 PUD 41d80e067 PMD 0 Oops: 0002 [#1] PREEMPT SMP CPU 5 Modules linked in: oprofile(-) Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d torvalds#18 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff8123c226>] [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP: 0018:ffff88041de1de98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200 RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620 RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082 R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080 R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210 FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040) Stack: ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200 Call Trace: [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile] [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile] [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b RIP [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP <ffff88041de1de98> CR2: 0000000000000008 ---[ end trace 06d4e95b6aa3b437 ]--- CC: [email protected] # 2.6.37+ Signed-off-by: Robert Richter <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8de torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors #6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] #7 #8 #9 #10 #11 Ok. [ 0.807108] Booting Node 3, Processors #12 #13 #14 #15 #16 #17 Ok. [ 0.897587] Booting Node 2, Processors #18 #19 #20 #21 #22 #23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <[email protected]> Cc: Andreas Herrmann <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 6b16351acbd415e66ba16bf7d473ece1574cf0bc Author: Linus Torvalds <[email protected]> Date: Sun Jun 24 12:53:04 2012 -0700 Linux 3.5-rc4 commit 02b7d83436ae4b1d86a5df03e72c1c69af7e239d Author: Anatol Pomozov <[email protected]> Date: Sat Jun 23 15:54:34 2012 -0700 Fix typo in printed messages Coult -> Could Signed-off-by: Linus Torvalds <[email protected]> commit 104452f052dfcf62dbf0c4110c9234a3285f59bf Merge: 08d49c4 081f323 Author: Linus Torvalds <[email protected]> Date: Sun Jun 24 11:02:09 2012 -0700 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Avi Kivity: "Fixing a scheduling-while-atomic bug in the ppc code, and a bug which allowed pci bridges to be assigned to guests." * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_page KVM: Fix PCI header check on device assignment commit 08d49c46cf61f707f3f44228b362947bb57343e7 Merge: a4a20fd 2e51fd3 Author: Linus Torvalds <[email protected]> Date: Sun Jun 24 11:00:07 2012 -0700 Merge tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA fixes from Roland Dreier: - Fixes to new ocrdma driver - Typo in test in CMA * tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cma: QP type check on received REQs should be AND not OR RDMA/ocrdma: Fix off by one in ocrdma_query_gid() RDMA/ocrdma: Fixed RQ error CQE polling RDMA/ocrdma: Correct queue SGE calculation RDMA/ocrdma: Correct reported max queue sizes RDMA/ocrdma: Fixed GID table for vlan and events commit a4a20fd981b2e6419556ca474b3b9689d42e5233 Merge: 2ecedc4 0fa1f06 Author: Linus Torvalds <[email protected]> Date: Sun Jun 24 10:57:59 2012 -0700 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Nothing very controversial in here. Most of the fixes are for OMAP this time around, with some orion/kirkwood and a tegra patch mixed in." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: Orion: Fix Virtual/Physical mixup with watchdog ARM: Kirkwood: clk_register_gate_fn: add fn assignment ARM: Orion5x - Restore parts of io.h, with rework ARM: OMAP4: hwmod data: Force HDMI in no-idle while enabled ARM: OMAP2+: mux: fix sparse warning ARM: OMAP2+: CM: increase the module disable timeout ARM: OMAP4: clock data: add clockdomains for clocks used as main clocks ARM: OMAP4: hwmod data: fix 32k sync timer idle modes ARM: OMAP4+: hwmod: fix issue causing IPs not going back to Smart-Standby ARM: OMAP: Fix Beagleboard DVI reset gpio arm/dts: OMAP2: Fix interrupt controller binding ARM: OMAP2: Fix tusb6010 GPIO interrupt for n8x0 ARM: OMAP2+: Fix MUSB ifdefs for platform init code ARM: tegra: make tegra_cpu_reset_handler_enable() __init ARM: OMAP: PM: Lock clocks list while generating summary ARM: iconnect: Remove include of removed linux/spi/orion_spi.h commit 2ecedc478e7a20597d95f48144cbe5ee568c0f1b Merge: 002b758 59bbe27 Author: Linus Torvalds <[email protected]> Date: Sun Jun 24 10:57:15 2012 -0700 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Nothing major in here, one radeon SI fix for tiling, and one uninit var fix, two minor header file fixes." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm: drop comment about this header being autogenerated. drm/edid: don't return stack garbage from supports_rb vga_switcheroo: Add include guard drm/radeon: SI tiling fixes for display commit 2e51fd3c13e330d9364a211ddcd3e38771eeb4b4 Merge: 4dd81e8 7b33dc2 Author: Roland Dreier <[email protected]> Date: Sun Jun 24 04:59:59 2012 -0700 Merge branches 'cma' and 'ocrdma' into for-linus commit 0fa1f0609a0c1fe8b2be3c0089a2cb48f7fda521 Author: Andrew Lunn <[email protected]> Date: Fri Jun 22 08:54:02 2012 +0200 ARM: Orion: Fix Virtual/Physical mixup with watchdog The orion watchdog is expecting to be passed the physcial address of the hardware, and will ioremap() it to give a virtual address it will use as the base address for the hardware. However, when creating the platform resource record, a virtual address was being used. Add the necassary #define's so we can pass the physical address as expected. Tested on Kirkwood and Orion5x. Cc: stable <[email protected]> Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Olof Johansson <[email protected]> commit 5fb2ce119c113e5c987fa81ed89e73b2653e28e4 Author: Marc Kleine-Budde <[email protected]> Date: Fri Jun 22 08:54:01 2012 +0200 ARM: Kirkwood: clk_register_gate_fn: add fn assignment In commit: 98d9986 ARM: Kirkwood: Replace clock gating the kirkwood clock gating has been reworked. A custom variant of clock gating, that calls a custom function before gating the clock off, has been introduced. However in clk_register_gate_fn() this custom function "fn" is never assigned. This patch adds the missing fn assignment. Cc: stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> Tested-by: Andrew Lunn <[email protected]> Acked-by: Andrew Lunn <[email protected]> Signed-off-by: Olof Johansson <[email protected]> commit b5e12229a4850ae9b19ee5252508749da8844b3c Author: Andrew Lunn <[email protected]> Date: Fri Jun 22 20:57:57 2012 +0200 ARM: Orion5x - Restore parts of io.h, with rework Commit 4d5fc58dbe34b78157c05b319669bb3e064ba8bd (ARM: remove bunch of now unused mach/io.h files) removed the orion5x io.h. Unfortunately, this is still needed for the definition of IO_SPACE_LIMIT which overrides the default 64K. All Orion based systems have 1Mbyte of IO space per PCI[e] bus, and try to request_resource() this size. Orion5x has two such PCI buses. It is likely that the original, removed version, was broken. This version might be less broken. However, it has not been tested on hardware with a PCI card, let alone hardware with a PCI card with IO capabilities. Signed-off-by: Andrew Lunn <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Olof Johansson <[email protected]> commit a34a3b7264fdb40c8d4be8bebb38fd56dc48d162 Merge: e23d709 dc57aef Author: Olof Johansson <[email protected]> Date: Sat Jun 23 16:16:29 2012 -0700 Merge tag 'omap-fixes-a-for-3.5rc' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending into fixes From Paul Walmsley (as per Tony Lindgren's request): "Some uncontroversial OMAP clock, hwmod, and compiler warning fixes for 3.5-rc" * tag 'omap-fixes-a-for-3.5rc' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending: ARM: OMAP4: hwmod data: Force HDMI in no-idle while enabled ARM: OMAP2+: mux: fix sparse warning ARM: OMAP2+: CM: increase the module disable timeout ARM: OMAP4: clock data: add clockdomains for clocks used as main clocks ARM: OMAP4: hwmod data: fix 32k sync timer idle modes ARM: OMAP4+: hwmod: fix issue causing IPs not going back to Smart-Standby ARM: OMAP: PM: Lock clocks list while generating summary commit e23d7096f9633d37aa35dffab9b0bd594ed64533 Merge: 6355f25 aef2b89 Author: Olof Johansson <[email protected]> Date: Sat Jun 23 16:11:50 2012 -0700 Merge tag 'omap-fixes-for-v3.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes From Tony Lindgren: "Here are a few fixes with the biggest one being fix for Beagle DVI reset. All of them are regression fixes, except for the missing omap2 interrupt controller binding that somehow got missed earlier." * tag 'omap-fixes-for-v3.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP: Fix Beagleboard DVI reset gpio arm/dts: OMAP2: Fix interrupt controller binding ARM: OMAP2: Fix tusb6010 GPIO interrupt for n8x0 ARM: OMAP2+: Fix MUSB ifdefs for platform init code commit 002b758b6dc4d840e662f25625f696d7b43d48f4 Merge: 369c4f5 642c0db Author: Linus Torvalds <[email protected]> Date: Fri Jun 22 17:47:08 2012 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are a couple of fixes from Yan for bad pointer dereferences in the messenger code and when fiddling with page->private after page migration, a fix from Alex for a use-after-free in the osd client code, and a couple fixes for the message refcounting and shutdown ordering." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: flush msgr queue during mon_client shutdown rbd: Clear ceph_msg->bio_iter for retransmitted message libceph: use con get/put ops from osd_client libceph: osd_client: don't drop reply reference too early ceph: check PG_Private flag before accessing page->private commit 369c4f542fd5e197ace5f9fdd33c558fb2358480 Merge: a116371 f7bdf03 Author: Linus Torvalds <[email protected]> Date: Fri Jun 22 11:07:55 2012 -0700 Merge tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs Pull XFS fixes from Ben Myers: - Fix stale data exposure with unwritten extents - Fix a warning in xfs_alloc_vextent with ODEBUG - Fix overallocation and alignment of pages for xfs_bufs - Fix a cursor leak - Fix a log hang - Fix a crash related to xfs_sync_worker - Rename xfs log structure from struct log to struct xlog so we can use crash dumps effectively * tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs: xfs: rename log structure to xlog xfs: shutdown xfs_sync_worker before the log xfs: Fix overallocation in xfs_buf_allocate_memory() xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near xfs: check for stale inode before acquiring iflock on push xfs: fix debug_object WARN at xfs_alloc_vextent() xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2) commit a11637194adc8bf2c2022ac89314dbdd1fcd1778 Merge: 636040b 6921a57 Author: Linus Torvalds <[email protected]> Date: Fri Jun 22 10:58:57 2012 -0700 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ftrace: Make all inline tags also include notrace perf: Use css_tryget() to avoid propping up css refcount perf tools: Fix synthesizing tracepoint names from the perf.data headers perf stat: Fix default output file perf tools: Fix endianity swapping for adds_features bitmask commit 59bbe27ba0f6bae1d85f1521e43181d98ee9c5ab Author: Dave Airlie <[email protected]> Date: Fri Jun 22 11:04:55 2012 +0100 drm: drop comment about this header being autogenerated. This comment is well out of date. Signed-off-by: Dave Airlie <[email protected]> commit dc57aef503859dbf724f6126c58b2e1672e215f3 Author: Ricardo Neri <[email protected]> Date: Thu Jun 21 10:08:53 2012 +0200 ARM: OMAP4: hwmod data: Force HDMI in no-idle while enabled As per the OMAP4 documentation, audio over HDMI must be transmitted in no-idle mode. This patch adds the HWMOD_SWSUP_SIDLE so that omap_hwmod uses no-idle/force-idle settings instead of smart-idle mode. This is required as the DSS interface clock is used as functional clock for the HDMI wrapper audio FIFO. If no-idle mode is not used, audio could be choppy, have bad quality or not be audible at all. Signed-off-by: Ricardo Neri <[email protected]> [[email protected]: Update the subject and align the .flags location with the script template] Signed-off-by: Benoit Cousson <[email protected]> Signed-off-by: Paul Walmsley <[email protected]> commit 65e25976b75c4996c5650afadc4180db09ec7475 Author: Paul Walmsley <[email protected]> Date: Sun Jun 17 12:00:58 2012 -0600 ARM: OMAP2+: mux: fix sparse warning Commit bbd707acee279a61177a604822db92e8164d00db ("ARM: omap2: use machine specific hook for late init") resulted in the addition of this sparse warning: arch/arm/mach-omap2/mux.c:791:12: warning: symbol 'omap_mux_late_init' was not declared. Should it be static? Fix by including the header file containing the prototype. Signed-off-by: Paul Walmsley <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Tony Lindgren <[email protected]> commit b8f15b7e1dd3638d35cfff85571df1d7ea96e35e Author: Paul Walmsley <[email protected]> Date: Sun Jun 17 11:57:53 2012 -0600 ARM: OMAP2+: CM: increase the module disable timeout Increase the timeout for disabling an IP block to five milliseconds. This is to handle the usb_host_fs idle latency, which takes almost four milliseconds after a host controller reset. This is the second of two patches needed to resolve the following boot warning: omap_hwmod: usb_host_fs: _wait_target_disable failed Thanks to Sergei Shtylyov <[email protected]> for finding an unrelated hunk in a previous version of this patch. Signed-off-by: Paul Walmsley <[email protected]> Cc: Sergei Shtylyov <[email protected]> Cc: Tero Kristo <[email protected]> commit 9a47d32d5c4b91a4ce4c459f3b7b0290185e7578 Author: Paul Walmsley <[email protected]> Date: Sun Jun 17 11:57:52 2012 -0600 ARM: OMAP4: clock data: add clockdomains for clocks used as main clocks Until the OMAP4 code is converted to disable the use of the clock framework-based clockdomain enable/disable sequence, any clock used as a hwmod main_clk must have a clockdomain associated with it. This patch populates some clock structure clockdomain names to resolve the following warnings during kernel init: omap_hwmod: dpll_mpu_m2_ck: missing clockdomain for dpll_mpu_m2_ck. omap_hwmod: trace_clk_div_ck: missing clockdomain for trace_clk_div_ck. omap_hwmod: l3_div_ck: missing clockdomain for l3_div_ck. omap_hwmod: ddrphy_ck: missing clockdomain for ddrphy_ck. Signed-off-by: Paul Walmsley <[email protected]> Cc: Rajendra Nayak <[email protected]> Cc: Benoît Cousson <[email protected]> commit 252a4c5443ec4d0bd310ae5b0d1b7f9c5b1e7f46 Author: Paul Walmsley <[email protected]> Date: Sun Jun 17 11:57:51 2012 -0600 ARM: OMAP4: hwmod data: fix 32k sync timer idle modes The 32k sync timer IP block target idle modes in the hwmod data are incorrect. The IP block does not support any smart-idle modes. Update the data to reflect the correct modes. This problem was initially identified and a diff fragment posted to the lists by Benoît Cousson <[email protected]>. A patch description bug in the first version was also identified by Benoît. Signed-off-by: Paul Walmsley <[email protected]> Cc: Benoît Cousson <[email protected]> Cc: Tero Kristo <[email protected]> commit 561038f0a8aa1de272a2ac5dad24cc8246d9f496 Author: Djamil Elaidi <[email protected]> Date: Sun Jun 17 11:57:51 2012 -0600 ARM: OMAP4+: hwmod: fix issue causing IPs not going back to Smart-Standby If an IP is configured in Smart-Standby-Wakeup, when disabling wakeup feature the IP will not go back to Smart-Standby, but will remain in Smart-Standby-Wakeup. Signed-off-by: Djamil Elaidi <[email protected]> Signed-off-by: Paul Walmsley <[email protected]> commit 636040b4eddf6152b5d0b2d574663809f898953b Merge: 8874e81 b102743 Author: Linus Torvalds <[email protected]> Date: Thu Jun 21 16:05:43 2012 -0700 Merge tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Fix a write hang due to an uninitalised variable when !defined(CONFIG_NFS_V4) - Address upcall races in the legacy NFSv4 idmapper - Remove an O_DIRECT refcounting issue - Fix a pNFS refcounting bug when the file layout metadata server is also acting as a data server - Fix a pNFS module loading race. * tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Force the legacy idmapper to be single threaded NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4) NFS: Fix a refcounting issue in O_DIRECT NFSv4.1: Fix a race in set_pnfs_layoutdriver NFSv4.1: Fix umount when filelayout DS is also the MDS commit 8874e812feb4926f4a51a82c4fca75c7daa05fc5 Merge: 7b83778 cb77fcd Author: Linus Torvalds <[email protected]> Date: Thu Jun 21 13:41:07 2012 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small pull with btrfs fixes. The biggest of the bunch is another fix for the new backref walking code. We're still hammering out one btrfs dio vs buffered reads problem, but that one will have to wait for the next rc." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: delay iput with async extents Btrfs: add a missing spin_lock Btrfs: don't assume to be on the correct extent in add_all_parents Btrfs: introduce btrfs_next_old_item commit 7b8377862bd816a5e8ceb5c713d88bf82555c8d4 Merge: 7940b2a 2355375 Author: Linus Torvalds <[email protected]> Date: Thu Jun 21 13:40:40 2012 -0700 Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Two minor fixes in emc2103 and applesmc drivers." * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (emc2103) Fix use of an uninitilized variable in error case hwmon: (applesmc) Limit key length in warning messages commit f7bdf03a99efc083608cd9c0c3e03abff311c79e Author: Mark Tinguely <[email protected]> Date: Thu Jun 14 09:22:15 2012 -0500 xfs: rename log structure to xlog Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: Mark Tinguely <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit 8866fc6fa55e31b2bce931b7963ff16641b39dc7 Author: Ben Myers <[email protected]> Date: Fri May 25 15:45:36 2012 -0500 xfs: shutdown xfs_sync_worker before the log Revert commit 1307bbd, which uses the s_umount semaphore to provide exclusion between xfs_sync_worker and unmount, in favor of shutting down the sync worker before freeing the log in xfs_log_unmount. This is a cleaner way of resolving the race between xfs_sync_worker and unmount than using s_umount. Signed-off-by: Ben Myers <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Reviewed-by: Dave Chinner <[email protected]> commit 59c84ed0ddc11f1823b4a33ace4fbcc948261bb2 Author: Jan Kara <[email protected]> Date: Wed Jun 6 00:32:26 2012 +0200 xfs: Fix overallocation in xfs_buf_allocate_memory() Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug causing xfs_buf_allocate_memory() to overestimate the number of necessary pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively every buffer is straddling a page boundary which causes xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access which is unnecessary. Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the buffer is inserted into the cache only after being fully initialized now. So just make xfs_buf_alloc() fill in proper block number from the beginning. CC: David Chinner <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit 76d095388b040229ea1aad7dea45be0cfa20f589 Author: Dave Chinner <[email protected]> Date: Tue Jun 12 14:20:26 2012 +1000 xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near When we fail to find an matching extent near the requested extent specification during a left-right distance search in xfs_alloc_ag_vextent_near, we fail to free the original cursor that we used to look up the XFS_BTNUM_CNT tree and hence leak it. Reported-by: Chris J Arges <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit 9a3a5dab63461b84213052888bf38a962b22d035 Author: Brian Foster <[email protected]> Date: Mon Jun 11 10:39:43 2012 -0400 xfs: check for stale inode before acquiring iflock on push An inode in the AIL can be flush locked and marked stale if a cluster free transaction occurs at the right time. The inode item is then marked as flushing, which causes xfsaild to spin and leaves the filesystem stalled. This is reproduced by running xfstests 273 in a loop for an extended period of time. Check for stale inodes before the flush lock. This marks the inode as pinned, leads to a log flush and allows the filesystem to proceed. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mark Tinguely <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit cb77fcd88569cd2b7b25ecd4086ea932a53be9b3 Author: Josef Bacik <[email protected]> Date: Fri Jun 15 12:19:48 2012 -0600 Btrfs: delay iput with async extents There is some concern that these iput()'s could be the final iputs and could induce lockups on people waiting on writeback. This would happen in the rare case that we don't create ordered extents because of an error, but it is theoretically possible and we already have a mechanism to deal with this so just make them delayed iputs to negate any worry. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]> commit e18fca734278784bd6591de63ca148cc27344ca9 Author: Josef Bacik <[email protected]> Date: Mon Jun 18 07:23:18 2012 -0600 Btrfs: add a missing spin_lock When fixing up the locking in the delayed ref destruction work I accidently broke the locking myself ;(. Add back a spin_lock that should be there and we are now all set. Thanks, Btrfs: add a missing spin_lock When fixing up the locking in the delayed ref destruction work I accidently broke the locking myself ;(. Add back a spin_lock that should be there and we are now all set. Thanks, Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]> commit 69bca40d41c613927b150c5392505f1894fe3010 Author: Alexander Block <[email protected]> Date: Tue Jun 19 07:42:26 2012 -0600 Btrfs: don't assume to be on the correct extent in add_all_parents add_all_parents did assume that path is already at a correct extent data item, which may not be true in case of data extents that were partly rewritten and splitted. We need to check if we're on a matching extent for every item and only for the ones after the first. The loop is changed to do this now. This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_ref". The removal of next_leaf did sometimes result in slot==nritems when the above described case happens, and thus resulting in invalid values (e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs or even crashes). Signed-off-by: Alexander Block <[email protected]> Signed-off-by: Jan Schmidt <[email protected]> Signed-off-by: Chris Mason <[email protected]> commit 1c8f52a5e9539600543347bcdefafa1854e07986 Author: Alexander Block <[email protected]> Date: Tue Jun 19 07:42:25 2012 -0600 Btrfs: introduce btrfs_next_old_item We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead of btrfs_next_leaf. btrfs_next_item is also changed to simply call btrfs_next_old_item with time_seq being 0. Signed-off-by: Alexander Block <[email protected]> Signed-off-by: Chris Mason <[email protected]> commit b196a4980ff7bb54db478e2a408dc8b12be15304 Author: Daniel Vetter <[email protected]> Date: Tue Jun 19 11:33:06 2012 +0200 drm/edid: don't return stack garbage from supports_rb We need to initialize this to false, because the is_rb callback only ever sets it to true. Noticed while reading through the code. Cc: [email protected] Signed-Off-by: Daniel Vetter <[email protected]> Reviewed-by: Adam Jackson <[email protected]> Signed-off-by: Dave Airlie <[email protected]> commit d3decf3a0c1d28c80c76be170373f4c7a7217f09 Author: Ozan Çağlayan <[email protected]> Date: Thu Jun 14 15:02:35 2012 +0300 vga_switcheroo: Add include guard Guard vga_switcheroo.h against multiple inclusion. Signed-off-by: Ozan Çağlayan <[email protected]> Signed-off-by: Dave Airlie <[email protected]> commit 7940b2adb474608318e51294d4e9fa0eee1bef61 Merge: 2ce5682 fdec53d Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 22:12:52 2012 -0700 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma Pull slave-dmaengine fixes from Vinod Koul: "A few fixes in pl330 and imx-sdma drivers." * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: DMA: PL330: Fix racy mutex unlock DMA: PL330: Add missing static storage class specifier dma: imx-sdma: buf_tail should be initialize in prepare function dmaengine: pl330: dont complete descriptor for cyclic dma commit 2ce5682947872061148b0e5ed2212e03d0d8bc8b Merge: c4c0e9e 8e3bbf4 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 22:11:04 2012 -0700 Merge branch 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull two cgroup fixes from Tejun Heo: "This containes two patches fixing a refcnt race bug during css_put(). Decrementing and checking the value weren't atomic and two tasks could think that they both pushed the counter to zero." * 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroups: Account for CSS_DEACT_BIAS in __css_put cgroup: make sure that decisions in __css_put are atomic commit c4c0e9e544a0eb640798cc66e68f394fa4a561bf Author: David Rientjes <[email protected]> Date: Wed Jun 20 18:00:12 2012 -0700 mm, mempolicy: fix mbind() to do synchronous migration If the range passed to mbind() is not allocated on nodes set in the nodemask, it migrates the pages to respect the constraint. The final formal of migrate_pages() is a mode of type enum migrate_mode, not a boolean. do_mbind() is currently passing "true" which is the equivalent of MIGRATE_SYNC_LIGHT. This should instead be MIGRATE_SYNC for synchronous page migration. Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 0e3534c0417fef6e6535db8867a04798f2024e3f Author: Randy Dunlap <[email protected]> Date: Wed Jun 20 18:30:30 2012 -0700 media: pms.c needs linux/slab.h drivers/media/video/pms.c uses kzalloc() and kfree() so it should include <linux/slab.h> to fix build errors and a warning. drivers/media/video/pms.c:1047:2: error: implicit declaration of function 'kzalloc' drivers/media/video/pms.c:1047:6: warning: assignment makes pointer from integer without a cast drivers/media/video/pms.c:1116:2: error: implicit declaration of function 'kfree' Found in mmotm but applies to mainline. Signed-off-by: Randy Dunlap <[email protected]> Cc: Hans Verkuil <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit bc259adc9b76f625fff0423df3ffb80a03802927 Merge: fe80352 3026b0e Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 15:15:03 2012 -0700 Merge tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging tree fixes from Greg Kroah-Hartman: "Here are a number of small fixes for the drivers/staging tree, as well as iio and pstore drivers (which came from the staging tree in the 3.5-rc1 merge). All of these are tiny, but resolve issues that people have been reporting. There's also a documentation update to reflect what the iio drivers really are doing, which is good to get straightened out. Signed-off-by: Greg Kroah-Hartman <[email protected]>" * tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: r8712u: Add new USB IDs staging: gdm72xx: Release netlink socket properly iio: drop wrong reference from Kconfig pstore/inode: Make pstore_fill_super() static pstore/ram: Should zap persistent zone on unlink pstore/ram_core: Factor persistent_ram_zap() out of post_init() pstore/ram_core: Do not reset restored zone's position and size pstore/ram: Should update old dmesg buffer before reading staging:iio:ad7298: Fix linker error due to missing IIO kfifo buffer Revert "staging: usbip: bugfix for stack corruption on 64-bit architectures" staging: usbip: bugfix for stack corruption on 64-bit architectures staging/comedi: fix build for USB not enabled staging: omapdrm: fix crash when freeing bad fb staging:iio:ad7606: Re-add missing scale attribute iio: Fix potential use after free staging:iio: remove num_interrupt_lines from documentation iio: documentation: Add out_altvoltage and friends commit fe80352460971de12519bf46ed5ec4235350bcd7 Merge: f8fc0c9 96c9f05 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 15:14:28 2012 -0700 Merge tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and printk fixes from Greg Kroah-Hartman: "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that people have reported showing up after the printk and kmsg changes went into 3.5-rc1. There are also a smattering of other tiny fixes for the extcon and hyper-v drivers that people have reported. Signed-off-by: Greg Kroah-Hartman <[email protected]>" * tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove() extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak extcon: Fix wrong index in max8997_extcon_cable[] kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation printk: return -EINVAL if the message len is bigger than the buf size printk: use mutex lock to stop syslog_seq from going wild kmsg - kmsg_dump() use iterator to receive log buffer content vme: change maintainer e-mail address Extcon: Don't try to create duplicate link names driver core: fixup reversed deferred probe order printk: Fix alignment of buf causing crash on ARM EABI Tools: hv: verify origin of netlink connector message commit f8fc0c9a5f7f4f5a3d2e7dd58147e30053cc5dd8 Merge: a1821f7 49fbd3f Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 15:13:56 2012 -0700 Merge tag 'char-misc-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc tree updates from Greg Kroah-Hartman: "Here are some drivers/misc bugfixes (really just drivers/misc/mei/ fixes) for a few problems that have been reported. Signed-off-by: Greg Kroah-Hartman <[email protected]>" * tag 'char-misc-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc: mei: set WDIOF_ALARMONLY on mei watchdog misc: mei: Disable MSI when IRQ registration fails misc: mei: fix stalled read misc: mei: unregister misc device in pci_remove function misc: mei: set IRQF_ONESHOT for msi request_threaded_irq commit a1821f774d4600727edf71005f259a9fdb73981e Merge: a2a2609 78d80c5 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 15:13:13 2012 -0700 Merge tag 'tty-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg Kroah-Hartman: "Here are 3 patches resolving a boot regression (the mop500 fix), a build warning fix, and a kernel-doc fix. All tiny, but should go into the final 3.5 release. Signed-off-by: Greg Kroah-Hartman <[email protected]>" * tag 'tty-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial/amba-pl011: move custom pin control to driver serial: fix serial_txx9.c build warning/typo serial: fix kernel-doc warnings in 8250.c commit a2a2609c97c1e21996b9d87d10d2c9ff07277524 Merge: a4d7a12 48c3b58 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 14:41:57 2012 -0700 Merge branch 'akpm' (Andrew's patch-bomb) * emailed from Andrew Morton <[email protected]>: (21 patches) mm/memblock: fix overlapping allocation when doubling reserved array c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper pidns: guarantee that the pidns init will be the last pidns process reaped fault-inject: avoid call to random32() if fault injection is disabled Viresh has moved get_maintainer: Fix --help warning mm/memory.c: fix kernel-doc warnings mm: fix kernel-doc warnings mm: correctly synchronize rss-counters at exit/exec mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range h8300: use the declarations provided by <asm/sections.h> h8300: fix use of extinct _sbss and _ebss xtensa: use the declarations provided by <asm/sections.h> xtensa: use "test -e" instead of bashism "test -a" xtensa: replace xtensa-specific _f{data,text} by _s{data,text} memcg: fix use_hierarchy css_is_ancestor oops regression mm, oom: fix and cleanup oom score calculations nilfs2: ensure proper cache clearing for gc-inodes thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE ... commit 48c3b583bbddad2220ca4c22319ca5d1f78b2090 Author: Greg Pearson <[email protected]> Date: Wed Jun 20 12:53:05 2012 -0700 mm/memblock: fix overlapping allocation when doubling reserved array __alloc_memory_core_early() asks memblock for a range of memory then try to reserve it. If the reserved region array lacks space for the new range, memblock_double_array() is called to allocate more space for the array. If memblock is used to allocate memory for the new array it can end up using a range that overlaps with the range originally allocated in __alloc_memory_core_early(), leading to possible data corruption. With this patch memblock_double_array() now calls memblock_find_in_range() with a narrowed candidate range (in cases where the reserved.regions array is being doubled) so any memory allocated will not overlap with the original range that was being reserved. The range is narrowed by passing in the starting address and size of the previously allocated range. Then the range above the ending address is searched and if a candidate is not found, the range below the starting address is searched. Signed-off-by: Greg Pearson <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 5702c5eeab959e86ee2d9b4fe7f2d87e65b25d46 Author: Cyrill Gorcunov <[email protected]> Date: Wed Jun 20 12:53:04 2012 -0700 c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place During merging of PR_GET_TID_ADDRESS patch the code has been misplaced (it happened to appear under PR_MCE_KILL) in result noone can use this option. Fix it by moving code snippet to a proper place. Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 50d75f8daead8a1f850c40a3b6c6575ab19b48cf Author: Oleg Nesterov <[email protected]> Date: Wed Jun 20 12:53:04 2012 -0700 pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper find_new_reaper() changes pid_ns->child_reaper, see add0d4df ("pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing"). The original reason has gone away after the previous patch, ->children list must be empty after zap_pid_ns_processes(). However now we can not switch to init_pid_ns.child_reaper. __unhash_process() relies on the "->child_reaper == parent" check, but this check does not work if the last exiting task is also the child reaper. As Eric sugested, we can change __unhash_process() to use the parent's pid_ns and remove this code. Also, with this change we can move detach_pid(PIDTYPE_PID) back, where it was before the previous fix. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Louis Rilling <[email protected]> Cc: Mike Galbraith <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Tested-by: Andrew Wagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 6347e90091041e34bea625370794c92f4ce71228 Author: Eric W. Biederman <[email protected]> Date: Wed Jun 20 12:53:03 2012 -0700 pidns: guarantee that the pidns init will be the last pidns process reaped Today we have a twofold bug. Sometimes release_task on pid == 1 in a pid namespace can run before other processes in a pid namespace have had release task called. With the result that pid_ns_release_proc can be called before the last proc_flus_task() is done using upid->ns->proc_mnt, resulting in the use of a stale pointer. This same set of circumstances can lead to waitpid(...) returning for a processes started with clone(CLONE_NEWPID) before the every process in the pid namespace has actually exited. To fix this modify zap_pid_ns_processess wait until all other processes in the pid namespace have exited, even EXIT_DEAD zombies. The delay_group_leader and related tests ensure that the thread gruop leader will be the last thread of a process group to be reaped, or to become EXIT_DEAD and self reap. With the change to zap_pid_ns_processes we get the guarantee that pid == 1 in a pid namespace will be the last task that release_task is called on. With pid == 1 being the last task to pass through release_task pid_ns_release_proc can no longer be called too early nor can wait return before all of the EXIT_DEAD tasks in a pid namespace have exited. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Louis Rilling <[email protected]> Cc: Mike Galbraith <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Tested-by: Andrew Wagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit f39cdaebb89dc3e6dd4f3e75b6d4e87ef12190af Author: Anton Blanchard <[email protected]> Date: Wed Jun 20 12:53:03 2012 -0700 fault-inject: avoid call to random32() if fault injection is disabled After enabling CONFIG_FAILSLAB I noticed random32 in profiles even if slub fault injection wasn't enabled at runtime. should_fail forces a comparison against random32() even if probability is 0: if (attr->probability <= random32() % 100) return false; Add a check up front for probability == 0 and avoid all of the more complicated checks. Signed-off-by: Anton Blanchard <[email protected]> Acked-by: Akinobu Mita <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 10d8935f46e5028847b179757ecbf9238b13d129 Author: Viresh Kumar <[email protected]> Date: Wed Jun 20 12:53:02 2012 -0700 Viresh has moved [email protected] email-id doesn't exist anymore as I have left the company. Replace ST's id with [email protected]. It also updates .mailmap file to fix address for 'git shortlog' Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 7dea26813507bfa3d261a81f70494336c3b28293 Author: Joe Perches <[email protected]> Date: Wed Jun 20 12:53:02 2012 -0700 get_maintainer: Fix --help warning Using --help emits a concatenation error. Fix it. Signed-off-by: Joe Perches <[email protected]> Reported-by: Paul Bolle <[email protected]> Tested-by: Paul Bolle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit eb4546bbbdb160aff084d50511165f385756af18 Author: Randy Dunlap <[email protected]> Date: Wed Jun 20 12:53:02 2012 -0700 mm/memory.c: fix kernel-doc warnings Fix kernel-doc warnings in mm/memory.c: Warning(mm/memory.c:1377): No description found for parameter 'start' Warning(mm/memory.c:1377): Excess function parameter 'address' description in 'zap_page_range' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit dad7557eb705688040aac134efa5418b66d5ed92 Author: Wanpeng Li <[email protected]> Date: Wed Jun 20 12:53:01 2012 -0700 mm: fix kernel-doc warnings Fix kernel-doc warnings such as Warning(../mm/page_cgroup.c:432): No description found for parameter 'id' Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record' Signed-off-by: Wanpeng Li <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 4fe7efdbdfb1c7e7a7f31decfd831c0f31d37091 Author: Konstantin Khlebnikov <[email protected]> Date: Wed Jun 20 12:53:01 2012 -0700 mm: correctly synchronize rss-counters at exit/exec do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does put_user(clear_child_tid) which can update task->rss_stat and thus make mm->rss_stat inconsistent. This triggers the "BUG:" printk in check_mm(). Let's fix this bug in the safest way, and optimize/cleanup this later. Reported-by: Markus Trippelsdorf <[email protected]> Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit e0897d75f0b22e8c3a7287a48548c5686ef73447 Author: David Rientjes <[email protected]> Date: Wed Jun 20 12:53:00 2012 -0700 mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range Andrea asked for addr, end, vma->vm_start, and vma->vm_end to be emitted when !rwsem_is_locked(&tlb->mm->mmap_sem). Otherwise, debugging the underlying issue is more difficult. Suggested-by: Andrea Arcangeli <[email protected]> Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 436814e61f5c526ed123853a9bf63fb2ff4ff94b Author: Geert Uytterhoeven <[email protected]> Date: Wed Jun 20 12:53:00 2012 -0700 h8300: use the declarations provided by <asm/sections.h> Cleanups: - Include <asm/sections.h>, - Remove the (different) extern declarations, - Remove the no longer needed address-of ('&') operators, - Remove the superfluous casts, use proper printk formatting instead. Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit ffb20313c01addc04212577af5f9b1399156a5bd Author: Geert Uytterhoeven <[email protected]> Date: Wed Jun 20 12:53:00 2012 -0700 h8300: fix use of extinct _sbss and _ebss Nowadays it should use __bss_start and __bss_stop Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit f022d0fa183c4def30ddd74f2baebd72c4254fcf Author: Geert Uytterhoeven <[email protected]> Date: Wed Jun 20 12:52:59 2012 -0700 xtensa: use the declarations provided by <asm/sections.h> Cleanups: - Include <asm/sections.h>, - Remove the (different) extern declarations, - Remove the no longer needed address-of ('&') operators, - Use %p to format pointer differences. Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Chris Zankel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 0eff08b5d1bf4b30f1a549a58c85806748256b18 Author: Geert Uytterhoeven <[email protected]> Date: Wed Jun 20 12:52:59 2012 -0700 xtensa: use "test -e" instead of bashism "test -a" On Ubuntu, /bin/sh is a symlink to dash, which does not support "test -a". This causes messages like test: 1: -a: unexpected operator test: 1: -a: unexpected operator and link failures like (.init.text+0x132): undefined reference to `platform_init' due to the appropriate platform code not being compiled. Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Chris Zankel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 5e7b6ed8e9bf3c8e3bb579fd0aec64f6526f8c81 Author: Geert Uytterhoeven <[email protected]> Date: Wed Jun 20 12:52:58 2012 -0700 xtensa: replace xtensa-specific _f{data,text} by _s{data,text} commit a2d063ac216c161 ("extable, core_kernel_data(): Make sure all archs define _sdata") missed xtensa. Xtensa does have a start of data marker, but calls it _fdata, causing kernel/built-in.o:(.text+0x964): undefined reference to `_sdata' _stext was already defined, but it was duplicated by _fdata. Signed-off-by: Geert Uytterhoeven <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Chris Zankel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 3a981f482cc29f7d0aeab509e51ea15519a6e961 Author: Hugh Dickins <[email protected]> Date: Wed Jun 20 12:52:58 2012 -0700 memcg: fix use_hierarchy css_is_ancestor oops regression If use_hierarchy is set, reclaim testing soon oopses in css_is_ancestor() called from __mem_cgroup_same_or_subtree() called from page_referenced(): when processes are exiting, it's easy for mm_match_cgroup() to pass along a NULL memcg coming from a NULL mm->owner. Check for that in __mem_cgroup_same_or_subtree(). Return true or false? False because we cannot know if it was in the hierarchy, but also false because it's better not to count a reference from an exiting process. Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Konstantin Khlebnikov <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 61eafb00d55dfbccdfce543c6b60e369ff4f8f18 Author: David Rientjes <[email protected]> Date: Wed Jun 20 12:52:58 2012 -0700 mm, oom: fix and cleanup oom score calculations The divide in p->signal->oom_score_adj * totalpages / 1000 within oom_badness() was causing an overflow of the signed long data type. This adds both the root bias and p->signal->oom_score_adj before doing the normalization which fixes the issue and also cleans up the calculation. Tested-by: Dave Jones <[email protected]> Signed-off-by: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit fbb24a3a915f105016f1c828476be11aceac8504 Author: Ryusuke Konishi <[email protected]> Date: Wed Jun 20 12:52:57 2012 -0700 nilfs2: ensure proper cache clearing for gc-inodes A gc-inode is a pseudo inode used to buffer the blocks to be moved by garbage collection. Block caches of gc-inodes must be cleared every time a garbage collection function (nilfs_clean_segments) completes. Otherwise, stale blocks buffered in the caches may be wrongly reused in successive calls of the GC function. For user files, this is not a problem because their gc-inodes are distinguished by a checkpoint number as well as an inode number. They never buffer different blocks if either an inode number, a checkpoint number, or a block offset differs. However, gc-inodes of sufile, cpfile and DAT file can store different data for the same block offset. Thus, the nilfs_clean_segments function can move incorrect block for these meta-data files if an old block is cached. I found this is really causing meta-data corruption in nilfs. This fixes the issue by ensuring cache clear of gc-inodes and resolves reported GC problems including checkpoint file corruption, b-tree corruption, and the following warning during GC. nilfs_palloc_freev: entry number 307234 already freed. ... Signed-off-by: Ryusuke Konishi <[email protected]> Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> [2.6.37+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit e4eed03fd06578571c01d4f1478c874bb432c815 Author: Andrea Arcangeli <[email protected]> Date: Wed Jun 20 12:52:57 2012 -0700 thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with "consistent" pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found "stable" later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as "stable" and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Jonathan Nieder <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Petr Matousek <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Jan Beulich <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Tested-by: Andrew Jones <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit abca7c4965845924f65d40e0aa1092bdd895e314 Author: Pravin B Shelar <[email protected]> Date: Wed Jun 20 12:52:56 2012 -0700 mm: fix slab->page _count corruption when using slub On arches that do not support this_cpu_cmpxchg_double() slab_lock is used to do atomic cmpxchg() on double word which contains page->_count. The page count can be changed from get_page() or put_page() without taking slab_lock. That corrupts page counter. Fix it by moving page->_count out of cmpxchg_double data. So that slub does no change it while updating slub meta-data in struct page. [[email protected]: use standard comment layout, tweak comment text] Reported-by: Amey Bhide <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> commit 3b876c8f2a361ceeed3fed894980c69066f903a0 Author: Jeff Liu <[email protected]> Date: Thu Jun 7 15:44:32 2012 +0800 xfs: fix debug_object WARN at xfs_alloc_vextent() Fengguang reports: [ 780.529603] XFS (vdd): Ending clean mount [ 781.454590] ODEBUG: object is on stack, but not annotated [ 781.455433] ------------[ cut here ]------------ [ 781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1() [ 781.455433] Hardware name: Bochs [ 781.455433] Modules linked in: [ 781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51 [ 781.455433] Call Trace: [ 781.455433] [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b [ 781.455433] [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c [ 781.455433] [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1 [ 781.455433] [<ffffffff81491c65>] debug_object_init+0x14/0x16 [ 781.455433] [<ffffffff8108842a>] __init_work+0x20/0x22 [ 781.455433] [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5 Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK. Reported-by: Wu Fengguang <[email protected]> Signed-off-by: Jie Liu <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit 66f9311381b4772003d595fb6c518f1647450db0 Author: Alain Renaud <[email protected]> Date: Fri Jun 8 15:34:46 2012 -0400 xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2) On filesytems with a block size smaller than PAGE_SIZE we currently have a problem with unwritten extents. If a we have multi-block page for which an unwritten extent has been allocated, and only some of the buffers have been written to, and they are not contiguous, we can expose stale data from disk in the blocks between the writes after extent conversion. Example of a page with unwritten and real data. buffer content 0 empty b_state = 0 1 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 2 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 3 empty b_state = 0 4 empty b_state = 0 5 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 6 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 7 empty b_state = 0 Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7 empty. Currently buffers 1, 2, 5, and 6 are added to a single ioend, and when IO has completed, extent conversion creates a real extent from block 1 through block 6, leaving 0 and 7 unwritten. However buffers 3 and 4 were not written to disk, so stale data is exposed from those blocks on a subsequent read. Fix this by setting iomap_valid = 0 when we find a buffer that is not Uptodate. This ensures that buffers 5 and 6 are not added to the same ioend as buffers 1 and 2. Later these blocks will be converted into two separate real extents, leaving the blocks in between unwritten. Signed-off-by: Alain Renaud <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Ben Myers <[email protected]> commit b7019b2f31fe7bec9f6f5dc1bf54cb0e0d73e047 Author: Alex Deucher <[email protected]> Date: Thu Jun 14 15:58:25 2012 -0400 drm/radeon: SI tiling fixes for display - Use the correct union for getting the tiling info - Properly init the PIPE_CONFIG field for SI Signed-off-by: Alex Deucher <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Dave Airlie <[email protected]> commit b1027439dff844675f6c0df97a1b1d190791a699 Author: Bryan Schumaker <[email protected]> Date: Wed Jun 20 14:35:28 2012 -0400 NFS: Force the legacy idmapper to be single threaded It was initially coded under the assumption that there would only be one request at a time, so use a lock to enforce this requirement.. Signed-off-by: Bryan Schumaker <[email protected]> CC: [email protected] [3.4+] Signed-off-by: Trond Myklebust <[email protected]> commit a4d7a122385e27bdd91101635c704327d7c0d87f Merge: 61fcbc8 10aa5a3 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 09:42:09 2012 -0700 Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm Pull ARM fixes from Russell King: "This includes three MMCI changes - one to fix up the wrong version of the DT support patch which was merged, and two to make deferred probing work. It also includes a fix to the OMAP SPI driver which is causing a boot time warning. The remainder are very minor ARM fixes." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: SPI: fix over-eager devm_xxx() conversion ARM: 7427/1: mmc: mmci: Defer probe() in case of yet uninitialized GPIOs ARM: 7426/1: mmc: mmci: Remove wrong error handling of gpio 0 ARM: 7425/1: extable: ensure fixup entries are 4-byte aligned ARM: 7421/1: bpf_jit: BPF_S_ANC_ALU_XOR_X support ARM: 7423/1: kprobes: run t32_simulate_ldr_literal() without insn slot ARM: 7422/1: mmc: mmci: Allocate platform memory during Device Tree boot commit 61fcbc8dfe40d6fb5e59ab31dfcef67d6019f1a5 Merge: f40759e daf7317 Author: Linus Torvalds <[email protected]> Date: Wed Jun 20 09:41:21 2012 -0700 Merge tag 'pinctrl-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull two pinctrl fixes from Linus Walleij: - Fixed a 2-line compile error for MXS - A pure documentation fix for Nomadik * tag 'pinctrl-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl/nomadik: document Alt-C glitch pinctrl: mxs: Use kfree to fix build error commit aef2b89662b8a7506846d0dc0df672d196ddf8d0 Author: Russ Dill <[email protected]> Date: Wed May 9 15:15:03 2012 -0700 ARM: OMAP: Fix Beagleboard DVI reset gpio Commit e813a55eb9c9bc6c8039fb16332cf43402125b30 ("OMAP: board-files: remove custom PD GPIO handling for DVI output") moved TFP410 chip's powerdown-gpio handling from the board files to the tfp410 driver. One gpio_request_one(powerdown-gpio, ...) was mistakenly left unremoved in the Beagle board file. This causes the tfp410 driver to fail to request the gpio on Beagle, causing the driver to fail and thus the DVI output doesn't work. This patch removes several boot errors from board-omap3beagle.c: - gpio_request: gpio--22 (DVI reset) status -22 - Unable to get DVI reset GPIO There is a combination of leftover code and revision confusion. Additionally, xM support is currently a hack. For original Beagleboard this removes the double initialization of GPIO 170, properly configures it as an output, and wraps the initialization in an if block so that xM does not attempt to request it. For Beagleboard xM it removes reference to GPIO 129 which was part of rev A1 and A2 designs, but never functioned. It then properly assigns beagle_dvi_device.reset_gpio in beagle_twl_gpio_setup and removes the hack of initializing it high. Additionally, it uses gpio_set_value_cansleep since this GPIO is connected through i2c. Unfortunately, there is no way to tell the difference between xM A2 and A3. However, GPIO 129 does not function on rev A1 and A2, and the TWL GPIO used on A3 and beyond is not used on rev A1 and A2, there are no problems created by this fix. Tested on Beagleboard-xM Rev C1 and Beagleboard Rev B4. Signed-off-by: Russ Dill <[email protected]> Acked-by: Tomi Valkeinen <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> commit 95dca12d6bf2dd5e7720506b8f9786318899b8d6 Author: Jon Hunter <[email protected]> Date: Tue Jun 12 19:40:46 2012 -0500 arm/dts: OMAP2: Fix interrupt controller binding When booting with device-tree on an OMAP2420H4, the kernel is hanging when initialising the interrupts and following kernel dumps is seen ... [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/arm/mach-omap2/irq.c:271 omap_intc_of_init+0x50/0xb4() [ 0.000000] unable to get intc registers [ 0.000000] Modules linked in: [ 0.000000] [<c001befc>] (unwind_backtrace+0x0/0xf4) from [<c0040c34>] (warn_slowpath_common+0x4c/0x64) [ 0.000000] [<c0040c34>] (warn_slowpath_common+0x4c/0x64) from [<c0040ce0>] (warn_slowpath_fmt+0x30/0x40) [ 0.000000] [<c0040ce0>] (warn_slowpath_fmt+0x30/0x40) from [<c066b8a4>] (omap_intc_of_init+0x50/0xb4) [ 0.000000] [<c066b8a4>] (omap_intc_of_init+0x50/0xb4) from [<c0688b70>] (of_irq_init+0x144/0x288) [ 0.000000] [<c0688b70>] (of_irq_init+0x144/0x288) from [<c0663294>] (init_IRQ+0x14/0x1c) [ 0.000000] [<c0663294>] (init_IRQ+0x14/0x1c) from [<c06607fc>] (start_kernel+0x198/0x304) [ 0.000000] [<c06607fc>] (start_kernel+0x198/0x304) from [<80008044>] (0x80008044) [ 0.000000] ---[ end trace 1b75b31a2719ed1c ]--- [ 0.000000] of_irq_init: children remain, but no parents The OMAP2 interrupt controller binding is missing the number of interrupts and interrupt controller register address. Adding these fixes the problem. Signed-off-by: Jon Hunter <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> commit 3d09b33fecf204561e5c7126648ec05c756c631c Author: Tony Lindgren <[email protected]> Date: Wed Jun 20 07:18:…

…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72ec #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8de #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…the Crashkernel Scenario [ Upstream commit 4058c39 ] The issue is that before entering the crash kernel, the DWC USB controller did not perform operations such as resetting the interrupt mask bits. After entering the crash kernel,before the USB interrupt handler registration was completed while loading the DWC USB driver,an GINTSTS_SOF interrupt was received.This triggered the misroute_irq process within the GIC handling framework,ultimately leading to the misrouting of the interrupt,causing it to be handled by the wrong interrupt handler and resulting in the issue. Summary:In a scenario where the kernel triggers a panic and enters the crash kernel,it is necessary to ensure that the interrupt mask bit is not enabled before the interrupt registration is complete. If an interrupt reaches the CPU at this moment,it will certainly not be handled correctly,especially in cases where this interrupt is reported frequently. Please refer to the Crashkernel dmesg information as follows (the message on line 3 was added before devm_request_irq is called by the dwc2_driver_probe function): [ 5.866837][ T1] dwc2 JMIC0010:01: supply vusb_d not found, using dummy regulator [ 5.874588][ T1] dwc2 JMIC0010:01: supply vusb_a not found, using dummy regulator [ 5.882335][ T1] dwc2 JMIC0010:01: before devm_request_irq irq: [71], gintmsk[0xf300080e], gintsts[0x04200009] [ 5.892686][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-jmnd1.2_RC torvalds#18 [ 5.900327][ C0] Hardware name: CMSS HyperCard4-25G/HyperCard4-25G, BIOS 1.6.4 Jul 8 2024 [ 5.908836][ C0] Call trace: [ 5.911965][ C0] dump_backtrace+0x0/0x1f0 [ 5.916308][ C0] show_stack+0x20/0x30 [ 5.920304][ C0] dump_stack+0xd8/0x140 [ 5.924387][ C0] pcie_xxx_handler+0x3c/0x1d8 [ 5.930121][ C0] __handle_irq_event_percpu+0x64/0x1e0 [ 5.935506][ C0] handle_irq_event+0x80/0x1d0 [ 5.940109][ C0] try_one_irq+0x138/0x174 [ 5.944365][ C0] misrouted_irq+0x134/0x140 [ 5.948795][ C0] note_interrupt+0x1d0/0x30c [ 5.953311][ C0] handle_irq_event+0x13c/0x1d0 [ 5.958001][ C0] handle_fasteoi_irq+0xd4/0x260 [ 5.962779][ C0] __handle_domain_irq+0x88/0xf0 [ 5.967555][ C0] gic_handle_irq+0x9c/0x2f0 [ 5.971985][ C0] el1_irq+0xb8/0x140 [ 5.975807][ C0] __setup_irq+0x3dc/0x7cc [ 5.980064][ C0] request_threaded_irq+0xf4/0x1b4 [ 5.985015][ C0] devm_request_threaded_irq+0x80/0x100 [ 5.990400][ C0] dwc2_driver_probe+0x1b8/0x6b0 [ 5.995178][ C0] platform_drv_probe+0x5c/0xb0 [ 5.999868][ C0] really_probe+0xf8/0x51c [ 6.004125][ C0] driver_probe_device+0xfc/0x170 [ 6.008989][ C0] device_driver_attach+0xc8/0xd0 [ 6.013853][ C0] __driver_attach+0xe8/0x1b0 [ 6.018369][ C0] bus_for_each_dev+0x7c/0xdc [ 6.022886][ C0] driver_attach+0x2c/0x3c [ 6.027143][ C0] bus_add_driver+0xdc/0x240 [ 6.031573][ C0] driver_register+0x80/0x13c [ 6.036090][ C0] __platform_driver_register+0x50/0x5c [ 6.041476][ C0] dwc2_platform_driver_init+0x24/0x30 [ 6.046774][ C0] do_one_initcall+0x50/0x25c [ 6.051291][ C0] do_initcall_level+0xe4/0xfc [ 6.055894][ C0] do_initcalls+0x80/0xa4 [ 6.060064][ C0] kernel_init_freeable+0x198/0x240 [ 6.065102][ C0] kernel_init+0x1c/0x12c Signed-off-by: Shawn Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ torvalds#18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]>

…the Crashkernel Scenario [ Upstream commit 4058c39 ] The issue is that before entering the crash kernel, the DWC USB controller did not perform operations such as resetting the interrupt mask bits. After entering the crash kernel,before the USB interrupt handler registration was completed while loading the DWC USB driver,an GINTSTS_SOF interrupt was received.This triggered the misroute_irq process within the GIC handling framework,ultimately leading to the misrouting of the interrupt,causing it to be handled by the wrong interrupt handler and resulting in the issue. Summary:In a scenario where the kernel triggers a panic and enters the crash kernel,it is necessary to ensure that the interrupt mask bit is not enabled before the interrupt registration is complete. If an interrupt reaches the CPU at this moment,it will certainly not be handled correctly,especially in cases where this interrupt is reported frequently. Please refer to the Crashkernel dmesg information as follows (the message on line 3 was added before devm_request_irq is called by the dwc2_driver_probe function): [ 5.866837][ T1] dwc2 JMIC0010:01: supply vusb_d not found, using dummy regulator [ 5.874588][ T1] dwc2 JMIC0010:01: supply vusb_a not found, using dummy regulator [ 5.882335][ T1] dwc2 JMIC0010:01: before devm_request_irq irq: [71], gintmsk[0xf300080e], gintsts[0x04200009] [ 5.892686][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-jmnd1.2_RC torvalds#18 [ 5.900327][ C0] Hardware name: CMSS HyperCard4-25G/HyperCard4-25G, BIOS 1.6.4 Jul 8 2024 [ 5.908836][ C0] Call trace: [ 5.911965][ C0] dump_backtrace+0x0/0x1f0 [ 5.916308][ C0] show_stack+0x20/0x30 [ 5.920304][ C0] dump_stack+0xd8/0x140 [ 5.924387][ C0] pcie_xxx_handler+0x3c/0x1d8 [ 5.930121][ C0] __handle_irq_event_percpu+0x64/0x1e0 [ 5.935506][ C0] handle_irq_event+0x80/0x1d0 [ 5.940109][ C0] try_one_irq+0x138/0x174 [ 5.944365][ C0] misrouted_irq+0x134/0x140 [ 5.948795][ C0] note_interrupt+0x1d0/0x30c [ 5.953311][ C0] handle_irq_event+0x13c/0x1d0 [ 5.958001][ C0] handle_fasteoi_irq+0xd4/0x260 [ 5.962779][ C0] __handle_domain_irq+0x88/0xf0 [ 5.967555][ C0] gic_handle_irq+0x9c/0x2f0 [ 5.971985][ C0] el1_irq+0xb8/0x140 [ 5.975807][ C0] __setup_irq+0x3dc/0x7cc [ 5.980064][ C0] request_threaded_irq+0xf4/0x1b4 [ 5.985015][ C0] devm_request_threaded_irq+0x80/0x100 [ 5.990400][ C0] dwc2_driver_probe+0x1b8/0x6b0 [ 5.995178][ C0] platform_drv_probe+0x5c/0xb0 [ 5.999868][ C0] really_probe+0xf8/0x51c [ 6.004125][ C0] driver_probe_device+0xfc/0x170 [ 6.008989][ C0] device_driver_attach+0xc8/0xd0 [ 6.013853][ C0] __driver_attach+0xe8/0x1b0 [ 6.018369][ C0] bus_for_each_dev+0x7c/0xdc [ 6.022886][ C0] driver_attach+0x2c/0x3c [ 6.027143][ C0] bus_add_driver+0xdc/0x240 [ 6.031573][ C0] driver_register+0x80/0x13c [ 6.036090][ C0] __platform_driver_register+0x50/0x5c [ 6.041476][ C0] dwc2_platform_driver_init+0x24/0x30 [ 6.046774][ C0] do_one_initcall+0x50/0x25c [ 6.051291][ C0] do_initcall_level+0xe4/0xfc [ 6.055894][ C0] do_initcalls+0x80/0xa4 [ 6.060064][ C0] kernel_init_freeable+0x198/0x240 [ 6.065102][ C0] kernel_init+0x1c/0x12c Signed-off-by: Shawn Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Link: https://lore.kernel.org/all/[email protected]/ Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

When PG_hwpoison pages are freed they are treated differently in free_pages_prepare() and instead of being released they are isolated. Page allocation tag counters are decremented at this point since the page is considered not in use. Later on when such pages are released by unpoison_memory(), the allocation tag counters will be decremented again and the following warning gets reported: [ 113.930443][ T3282] ------------[ cut here ]------------ [ 113.931105][ T3282] alloc_tag was not set [ 113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164 [ 113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4 [ 113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G W 6.11.0-rc4-dirty torvalds#18 [ 113.943003][ T3282] Tainted: [W]=WARN [ 113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [ 113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946706][ T3282] sp : ffff800087093a10 [ 113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0 [ 113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000 [ 113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000 [ 113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff [ 113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210 [ 113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339 [ 113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0 [ 113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff [ 113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001 [ 113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000 [ 113.957962][ T3282] Call trace: [ 113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164 [ 113.959000][ T3282] pgalloc_tag_sub+0x14/0x1c [ 113.959539][ T3282] free_unref_page+0xf4/0x4b8 [ 113.960096][ T3282] __folio_put+0xd4/0x120 [ 113.960614][ T3282] folio_put+0x24/0x50 [ 113.961103][ T3282] unpoison_memory+0x4f0/0x5b0 [ 113.961678][ T3282] hwpoison_unpoison+0x30/0x48 [hwpoison_inject] [ 113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc [ 113.963183][ T3282] simple_attr_write+0x38/0x48 [ 113.963750][ T3282] debugfs_attr_write+0x54/0x80 [ 113.964330][ T3282] full_proxy_write+0x68/0x98 [ 113.964880][ T3282] vfs_write+0xdc/0x4d0 [ 113.965372][ T3282] ksys_write+0x78/0x100 [ 113.965875][ T3282] __arm64_sys_write+0x24/0x30 [ 113.966440][ T3282] invoke_syscall+0x7c/0x104 [ 113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104 [ 113.967652][ T3282] do_el0_svc+0x2c/0x38 [ 113.968893][ T3282] el0_svc+0x3c/0x1b8 [ 113.969379][ T3282] el0t_64_sync_handler+0x98/0xbc [ 113.969980][ T3282] el0t_64_sync+0x19c/0x1a0 [ 113.970511][ T3282] ---[ end trace 0000000000000000 ]--- To fix this, clear the page tag reference after the page got isolated and accounted for. Link: https://lkml.kernel.org/r/[email protected] Fixes: d224eb0 ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Hao Ge <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: Suren Baghdasaryan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hao Ge <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: <[email protected]> [6.10+] Signed-off-by: Andrew Morton <[email protected]>

[ Upstream commit 373b933 ] Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ torvalds#18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Link: https://lore.kernel.org/all/[email protected]/ Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit d0e806b upstream. During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty torvalds#18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, torvalds#16] 6118: a9025bf5 stp x21, x22, [sp, torvalds#32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, torvalds#48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, torvalds#64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, torvalds#80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, torvalds#776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, torvalds#32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <[email protected]> Cc: [email protected] Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Signed-off-by: Alexey Klimov <[email protected]> Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty torvalds#18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, torvalds#16] 6118: a9025bf5 stp x21, x22, [sp, torvalds#32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, torvalds#48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, torvalds#64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, torvalds#80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, torvalds#776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, torvalds#32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <[email protected]> Cc: [email protected] Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Signed-off-by: Alexey Klimov <[email protected]> Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]>