Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202305] Fix the dps460 kernel driver for the 5.10 kernel #439

Merged

Conversation

saiarcot895
Copy link
Contributor

The dps460 driver has included a copy of struct pmbus_data within its code (which is bad for many reasons). Because of this, it hasn't been updated since the 4.19 version of struct pmbus_data (since there wouldn't have been any compilation errors), and now differs from the actual structure definition, resulting in eventual kernel panics.

As a quick fix, update struct pmbus_data that is present in dps460 to the current version.

Sample kernel panic:

[161211.582193] BUG: unable to handle page fault for address: 00000000000348e0
[161211.665585] #PF: supervisor write access in kernel mode
[161211.729234] #PF: error_code(0x0002) - not-present page
[161211.791844] PGD 1d42dd067 P4D 1d42dd067 PUD 1d42dc067 PMD 0
[161211.860693] Oops: 0002 [#1] SMP NOPTI
[161211.905628] CPU: 1 PID: 6768 Comm: python3 Kdump: loaded Tainted: G           OE     5.10.0-23-2-amd64 #1 Debian 5.10.179-3
[161212.039989] Hardware name: Dell Inc S6000-ACS/S6000 CPU, BIOS 4.6.5 10/12/2015
[161212.127574] RIP: 0010:native_queued_spin_lock_slowpath+0x19f/0x1e0
[161212.202652] Code: ff ff ff c6 47 01 00 e9 1d ff ff ff c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 c0 48 03 00 48 03 04 f5 00 59 b8 85 <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
[161212.428542] RSP: 0000:ffffb30241f4fd18 EFLAGS: 00010202
[161212.492189] RAX: 00000000000348e0 RBX: 0000000000000002 RCX: 0000000000080000
[161212.578714] RDX: ffff9a1ff7cb48c0 RSI: 000000000000250b RDI: ffff9a1ec0ae9508
[161212.665240] RBP: ffff9a1ec0ae9500 R08: 0000000000080000 R09: 0000000000000000
[161212.751769] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb30241f4fd38
[161212.838302] R13: ffff9a1ec0ae9508 R14: ffff9a1f69d14380 R15: 0000000000000001
[161212.924879] FS:  00007f6e9e515740(0000) GS:ffff9a1ff7c80000(0000) knlGS:0000000000000000
[161213.022794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161213.092681] CR2: 00000000000348e0 CR3: 00000001d42f6000 CR4: 00000000000006e0
[161213.179210] Call Trace:
[161213.209586]  _raw_spin_lock+0x1e/0x30
[161213.254519]  __mutex_lock.constprop.0+0x190/0x460
[161213.311937]  ? memcg_slab_post_alloc_hook+0x188/0x230
[161213.373500]  pmbus_show_sensor+0x2a/0xa0 [pmbus_core]
[161213.435070]  dev_attr_show+0x16/0x40
[161213.478957]  sysfs_kf_seq_show+0x98/0xf0
[161213.527007]  seq_read_iter+0x11f/0x4b0
[161213.572976]  new_sync_read+0x116/0x1b0
[161213.618947]  vfs_read+0xf8/0x180
[161213.658667]  ksys_read+0x5f/0xe0
[161213.698395]  do_syscall_64+0x30/0x80
[161213.742291]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[161213.803845] RIP: 0033:0x7f6e9e89b04e
[161213.847724] Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[161214.073617] RSP: 002b:00007ffd40c9cf18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[161214.165340] RAX: ffffffffffffffda RBX: 00000000025053c0 RCX: 00007f6e9e89b04e
[161214.251869] RDX: 0000000000001001 RSI: 00000000029d0840 RDI: 0000000000000009
[161214.338393] RBP: 00007f6e9e5156c0 R08: 0000000000000000 R09: 00007f6e9e6e6be0
[161214.424926] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000001001
[161214.511454] R13: 0000000000000009 R14: 00000000029d0840 R15: 00000000009184a0

The dps460 driver has included a copy of `struct pmbus_data` within its code
(which is bad for many reasons). Because of this, it hasn't been updated since
the 4.19 version of `struct pmbus_data` (since there wouldn't have been
any compilation errors), and now differs from the actual structure
definition, resulting in eventual kernel panics.

As a quick fix, update `struct pmbus_data` that is present in dps460 to the
current version.

Sample kernel panic:

```
[161211.582193] BUG: unable to handle page fault for address: 00000000000348e0
[161211.665585] #PF: supervisor write access in kernel mode
[161211.729234] #PF: error_code(0x0002) - not-present page
[161211.791844] PGD 1d42dd067 P4D 1d42dd067 PUD 1d42dc067 PMD 0
[161211.860693] Oops: 0002 [#1] SMP NOPTI
[161211.905628] CPU: 1 PID: 6768 Comm: python3 Kdump: loaded Tainted: G           OE     5.10.0-23-2-amd64 #1 Debian 5.10.179-3
[161212.039989] Hardware name: Dell Inc S6000-ACS/S6000 CPU, BIOS 4.6.5 10/12/2015
[161212.127574] RIP: 0010:native_queued_spin_lock_slowpath+0x19f/0x1e0
[161212.202652] Code: ff ff ff c6 47 01 00 e9 1d ff ff ff c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 c0 48 03 00 48 03 04 f5 00 59 b8 85 <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
[161212.428542] RSP: 0000:ffffb30241f4fd18 EFLAGS: 00010202
[161212.492189] RAX: 00000000000348e0 RBX: 0000000000000002 RCX: 0000000000080000
[161212.578714] RDX: ffff9a1ff7cb48c0 RSI: 000000000000250b RDI: ffff9a1ec0ae9508
[161212.665240] RBP: ffff9a1ec0ae9500 R08: 0000000000080000 R09: 0000000000000000
[161212.751769] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb30241f4fd38
[161212.838302] R13: ffff9a1ec0ae9508 R14: ffff9a1f69d14380 R15: 0000000000000001
[161212.924879] FS:  00007f6e9e515740(0000) GS:ffff9a1ff7c80000(0000) knlGS:0000000000000000
[161213.022794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161213.092681] CR2: 00000000000348e0 CR3: 00000001d42f6000 CR4: 00000000000006e0
[161213.179210] Call Trace:
[161213.209586]  _raw_spin_lock+0x1e/0x30
[161213.254519]  __mutex_lock.constprop.0+0x190/0x460
[161213.311937]  ? memcg_slab_post_alloc_hook+0x188/0x230
[161213.373500]  pmbus_show_sensor+0x2a/0xa0 [pmbus_core]
[161213.435070]  dev_attr_show+0x16/0x40
[161213.478957]  sysfs_kf_seq_show+0x98/0xf0
[161213.527007]  seq_read_iter+0x11f/0x4b0
[161213.572976]  new_sync_read+0x116/0x1b0
[161213.618947]  vfs_read+0xf8/0x180
[161213.658667]  ksys_read+0x5f/0xe0
[161213.698395]  do_syscall_64+0x30/0x80
[161213.742291]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[161213.803845] RIP: 0033:0x7f6e9e89b04e
[161213.847724] Code: 0f 1f 40 00 48 8b 15 79 af 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[161214.073617] RSP: 002b:00007ffd40c9cf18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[161214.165340] RAX: ffffffffffffffda RBX: 00000000025053c0 RCX: 00007f6e9e89b04e
[161214.251869] RDX: 0000000000001001 RSI: 00000000029d0840 RDI: 0000000000000009
[161214.338393] RBP: 00007f6e9e5156c0 R08: 0000000000000000 R09: 00007f6e9e6e6be0
[161214.424926] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000001001
[161214.511454] R13: 0000000000000009 R14: 00000000029d0840 R15: 00000000009184a0
```

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895 saiarcot895 requested a review from a team as a code owner October 7, 2024 18:04
Copy link

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StormLiangMS StormLiangMS merged commit 72616d7 into sonic-net:202305 Oct 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants