Use pin_user_pages_fast instead of get_user_pages_fast in create_existing_sysmem(drivers/hv/dxgkrnl/dxgvmbus.c) #11095

langyuxf · 2024-01-30T09:10:55Z

Windows Version

10.0.22631.3085

WSL Version

2.0.9.0

Are you using WSL 1 or WSL 2?

WSL 2
WSL 1

Kernel Version

5.15.133.1-1

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

1, use mmap to allocate a PRIVATE + ANONYMOUS CPU memory
2, use dxgkio_create_allocation and the CPU memory to create a GPU allocation
3, fork a child process, then write to this memory from CPU side in parent process

Expected Behavior

The backing store physical pages of CPU VA in parent process should be still mapped in GPU page table.

Actual Behavior

Due to fork and write from parent process, copy-on-write is triggered, new physical pages are allocated for parent process
which are not mapped in GPU page table.

Diagnostic Logs

No response

ghost · 2024-01-30T20:45:35Z

Word from the source is that this is unsupported by design. (Forking a process using a vGPU that is)

langyuxf · 2024-01-31T00:33:53Z

Can you elaborate on this？
What is unsupported？

langyuxf · 2024-02-01T01:46:23Z

Hi @pmartincic ,

I'm not forking a process(child process) using a vGPU, I mean the parent process should still be able to use the vGPU.
But now the parent process can't use vGPU when using get_user_pages_*.

Regards,
Lang

ghost · 2024-02-01T01:53:51Z

Ahhh, I'm sorry I misread what you wrote. That's my fault. I'll start the conversation internally again.

langyuxf · 2024-02-01T06:51:58Z

Please see
https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#page-maybe-dma-pinned-the-whole-point-of-pinning
https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#another-way-of-thinking-about-foll-get-foll-pin-and-foll-longterm
https://github.com/torvalds/linux/blob/master/include/linux/mm.h#L1967

langyuxf · 2024-02-03T07:42:45Z

Hi @pmartincic,

Any updates?

Thanks,
Lang

ghost · 2024-02-06T19:35:19Z

Do you have a sample/repro/minimal app/code sample you can post? I don't work with/on these features normally, is a little out of my wheelhouse and I want to perform a sanity check before I pass it along.

(Edit: they agree the parent process should still work)

langyuxf · 2024-02-06T23:50:59Z

It's hard to put all sample codes here to reproduce the issue.

What do you want to check?

langyuxf · 2024-02-08T01:35:31Z

Hi @pmartincic,

Have you guys seen these docs?

https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#page-maybe-dma-pinned-the-whole-point-of-pinning
https://github.com/torvalds/linux/blob/master/Documentation/core-api/pin_user_pages.rst#another-way-of-thinking-about-foll-get-foll-pin-and-foll-longterm
https://github.com/torvalds/linux/blob/master/include/linux/mm.h#L1967

Especially, the following

/*
 * This should most likely only be called during fork() to see whether we
 * should break the cow immediately for an anon page on the src mm.
 *
 * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
 */

static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
					  struct folio *folio)
{
	VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));

	if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
		return false;

	return folio_maybe_dma_pinned(folio);
}

Regards,
Lang

ghost · 2024-02-08T01:50:03Z

Trust but verify.

I don't work on this area of the product. My plate is pretty full. I don't have time to write a repro. Somebody who does know this area is willing to look at it. I have a limited amount of political capital. This person helping owes me nothing and it's not their responsibility to do what I ask. I will pass this bug along to them if you give a repro that I can sanity check.

langyuxf · 2024-02-08T02:06:00Z

I'm working on some closed source work on AMD hardware and It's hard for me to expose the codes.

The bug is actually clear.

ghost · 2024-02-08T17:49:48Z

You're missing the point. I'm probably not going to dig into this. The person I'm talking to who has the capacity to, would like a sample repro.

For their sake please make one.

langyuxf · 2024-02-08T23:16:23Z

Thanks，I got you point.

But it's inconvenient for me to expose a sample to reproduce which includes some closed source parts.

Can I talk with that person？If they have some questions.

langyuxf · 2024-02-09T10:59:16Z

I create a repro demo and hide hardware driver specific details. You guys must use AMD gfx9 GPU and above.

https://github.com/langyuxf/dxgkrnl_linux.git

Thanks
Lang

langyuxf · 2024-03-01T10:26:13Z

Hi @pmartincic @iourit,

We observed similar issues under memory pressure. Pages acquired via get_user_pages_fast() are reclaimed by linux memory management system while GPU page tables are not updated accordingly. Then the GPU still operates on the original pages.

Have you guys reproduced the issue? Thanks.

Regards,
Lang

benhillis · 2024-04-19T19:30:50Z

This should be fixed with https://github.com/microsoft/WSL/releases/tag/2.2.3.

ghost assigned iourit Jan 30, 2024

ghost added kernel WSL kernel and removed kernel WSL kernel labels Jan 30, 2024

ghost unassigned iourit Jan 30, 2024

ghost closed this as completed Jan 30, 2024

ghost added the feature label Jan 30, 2024

ghost reopened this Feb 1, 2024

ghost removed the feature label Feb 1, 2024

ghost self-assigned this Feb 6, 2024

ghost added GPU kernel WSL kernel labels Feb 12, 2024

ghost assigned iourit Feb 13, 2024

github-actions bot mentioned this issue May 6, 2024

add vma sanity check in create_existing_sysmem (drivers/hv/dxgkrnl/dxgvmbus.c) #11558

Open

2 tasks

Sarisan mentioned this issue Aug 20, 2024

GPU resource cannot be used in wsl2: misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22/-2 #11293

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pin_user_pages_fast instead of get_user_pages_fast in create_existing_sysmem(drivers/hv/dxgkrnl/dxgvmbus.c) #11095

Use pin_user_pages_fast instead of get_user_pages_fast in create_existing_sysmem(drivers/hv/dxgkrnl/dxgvmbus.c) #11095

langyuxf commented Jan 30, 2024 •

edited by ghost

Loading

ghost commented Jan 30, 2024 •

edited by ghost

Loading

langyuxf commented Jan 31, 2024

langyuxf commented Feb 1, 2024

ghost commented Feb 1, 2024

langyuxf commented Feb 1, 2024 •

edited

Loading

langyuxf commented Feb 3, 2024

ghost commented Feb 6, 2024 •

edited by ghost

Loading

langyuxf commented Feb 6, 2024

langyuxf commented Feb 8, 2024 •

edited

Loading

ghost commented Feb 8, 2024

langyuxf commented Feb 8, 2024

ghost commented Feb 8, 2024 •

edited by ghost

Loading

langyuxf commented Feb 8, 2024 •

edited

Loading

langyuxf commented Feb 9, 2024

langyuxf commented Mar 1, 2024

benhillis commented Apr 19, 2024

Use pin_user_pages_fast instead of get_user_pages_fast in create_existing_sysmem(drivers/hv/dxgkrnl/dxgvmbus.c) #11095

Use pin_user_pages_fast instead of get_user_pages_fast in create_existing_sysmem(drivers/hv/dxgkrnl/dxgvmbus.c) #11095

Comments

langyuxf commented Jan 30, 2024 • edited by ghost Loading

Windows Version

WSL Version

Are you using WSL 1 or WSL 2?

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

ghost commented Jan 30, 2024 • edited by ghost Loading

langyuxf commented Jan 31, 2024

langyuxf commented Feb 1, 2024

ghost commented Feb 1, 2024

langyuxf commented Feb 1, 2024 • edited Loading

langyuxf commented Feb 3, 2024

ghost commented Feb 6, 2024 • edited by ghost Loading

langyuxf commented Feb 6, 2024

langyuxf commented Feb 8, 2024 • edited Loading

ghost commented Feb 8, 2024

langyuxf commented Feb 8, 2024

ghost commented Feb 8, 2024 • edited by ghost Loading

langyuxf commented Feb 8, 2024 • edited Loading

langyuxf commented Feb 9, 2024

langyuxf commented Mar 1, 2024

benhillis commented Apr 19, 2024

langyuxf commented Jan 30, 2024 •

edited by ghost

Loading

ghost commented Jan 30, 2024 •

edited by ghost

Loading

langyuxf commented Feb 1, 2024 •

edited

Loading

ghost commented Feb 6, 2024 •

edited by ghost

Loading

langyuxf commented Feb 8, 2024 •

edited

Loading

ghost commented Feb 8, 2024 •

edited by ghost

Loading

langyuxf commented Feb 8, 2024 •

edited

Loading