Skip to content

Commit

Permalink
drm/amd/amdgpu: fix flr_work corner case
Browse files Browse the repository at this point in the history
[Why]
In SRIOV multi-vf environment, the flr_work can be entered
even if the TDR thread has entered the recovery. This can
lead to GMC flush tlb with SDMA during full access while SDMA
is not initialized.

[How]
1. flr_work should take write_lock, otherwise there maybe hw access
during vf flr
2. (amdgpu_in_reset(adev) ||!down_write_trylock(&adev->reset_sem))
is the correct critera when the flr_work direct returns.

Acked-by: Christian König <[email protected]>
Signed-off-by: Jingwen Chen <[email protected]>
  • Loading branch information
Jingwen Chen authored and Asher Song committed Mar 26, 2024
1 parent a4669ca commit 6ac6a32
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
3 changes: 2 additions & 1 deletion drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,8 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
* otherwise the mailbox msg will be ruined/reseted by
* the VF FLR.
*/
if (atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
if (amdgpu_in_reset(adev) ||
atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
return;

down_write(&adev->reset_domain->sem);
Expand Down
3 changes: 2 additions & 1 deletion drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,8 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
* otherwise the mailbox msg will be ruined/reseted by
* the VF FLR.
*/
if (atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
if (amdgpu_in_reset(adev) ||
atomic_cmpxchg(&adev->reset_domain->in_gpu_reset, 0, 1) != 0)
return;

down_write(&adev->reset_domain->sem);
Expand Down

0 comments on commit 6ac6a32

Please sign in to comment.