Skip to content

Commit

Permalink
dmesg: detect new RIP pattern
Browse files Browse the repository at this point in the history
To help auto bisect a number of boot errors.

For example,

[  956.671551] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  956.671557] IP: pgtable_trans_huge_withdraw+0x4c/0xc0
...
[  956.671650] RIP: pgtable_trans_huge_withdraw+0x4c/0xc0 RSP: ffffc90026b07c20

We failed to auto bisect it since the important "RIP:pgtable_trans_huge_withdraw"
feature is missed. The remaining ones like "dmesg.BUG:unable_to_handle_kernel"
are way too common.

wfg@inn /result/stress-ng/1s-memory-performance/lkp-bdw-ep6/debian-x86_64-2016-08-31.cgz/x86_64-rhel-7.2/gcc-6/bb176f67090ca54869fc1262c913aa69d2ede070/0% cat dmesg.json
{
  "dmesg.boot_failures": [
    1
  ],
  "dmesg.BUG:unable_to_handle_kernel": [
    1
  ],
  "dmesg.Oops:#[##]": [
    1
  ],
  "dmesg.Kernel_panic-not_syncing:Fatal_exception": [
    1
  ],
...

After patch,

  $ /c/lkp-tests/stats/dmesg dmesg-lkp-bdw-ep6:20171029153441:x86_64-rhel-7.2:gcc-6:4.14.0-rc6:1
  boot_failures: 1

  # BUG: unable to handle kernel
  BUG:unable_to_handle_kernel: 1
  message:BUG:unable_to_handle_kernel: [  328.471917] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  pattern:BUG:unable_to_handle_kernel: BUG: unable to handle kernel

  # Oops:
  Oops:#[##]: 1
  message:Oops:#[##]: [  328.471930] Oops: 0000 [fengguang#1] SMP
  pattern:Oops:#[##]: Oops:

+ # RIP: pgtable_trans_huge_withdraw+0x
+ RIP:pgtable_trans_huge_withdraw: 1
+ message:RIP:pgtable_trans_huge_withdraw: [  328.471980] RIP: 0010:pgtable_trans_huge_withdraw+0x4c/0xc0
+ pattern:RIP:pgtable_trans_huge_withdraw: RIP: pgtable_trans_huge_withdraw+0x

  # Kernel panic - not syncing: Fatal exception
  Kernel_panic-not_syncing:Fatal_exception: 1
  message:Kernel_panic-not_syncing:Fatal_exception: [  328.489702] Kernel panic - not syncing: Fatal exception
  pattern:Kernel_panic-not_syncing:Fatal_exception: Kernel panic - not syncing: Fatal exception

  timestamp:last: 328.496311
  timestamp:BUG:unable_to_handle_kernel: 328.471917
  timestamp:Oops:#[##]: 328.471930
  timestamp:RIP:pgtable_trans_huge_withdraw: 328.471980
  timestamp:Kernel_panic-not_syncing:Fatal_exception: 328.489702

CC: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
Signed-off-by: Philip Li <[email protected]>
  • Loading branch information
Fengguang Wu authored and rli9 committed Oct 31, 2017
1 parent 74d3ba7 commit f640c97
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
1 change: 1 addition & 0 deletions etc/oops-pattern
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ IP-Config: Auto-configuration of network failed
EIP is at [a-zA-Z0-9._]+\+0x.*/0x.*
EIP: [a-zA-Z0-9._]+\+0x[a-f0-9]+/0x[a-f0-9]+
RIP: [0-9a-f]{4}:\[.*\] [a-zA-Z0-9._]+\+0x.*/0x.*
RIP: [0-9a-f]{4}:[a-zA-Z0-9._]+\+0x.*/0x.*
PANIC: early exception
PANIC: double fault,
Unknown interrupt or fault at:
Expand Down
1 change: 1 addition & 0 deletions lib/dmesg.rb
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ def analyze_error_id(line)

error_id.gsub!(/([a-z]:)[0-9]+\b/, '\1') # WARNING: at arch/x86/kernel/cpu/perf_event.c:1077 x86_pmu_start+0xaa/0x110()
error_id.gsub!(/#:\[<#>\]\[<#>\]/, '') # RIP: 0010:[<ffffffff91906d8d>] [<ffffffff91906d8d>] validate_chain+0xed/0xe80
error_id.gsub!(/RIP:#:/, 'RIP:') # RIP: 0010:__might_sleep+0x72/0x80

[error_id, bug_to_bisect]
end
Expand Down

0 comments on commit f640c97

Please sign in to comment.