Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrateFile keeps file handle open after SdfLayer is destroyed #1766

Closed
nvidia-jomiller opened this issue Feb 10, 2022 · 7 comments
Closed

CrateFile keeps file handle open after SdfLayer is destroyed #1766

nvidia-jomiller opened this issue Feb 10, 2022 · 7 comments

Comments

@nvidia-jomiller
Copy link
Contributor

Description of Issue

We're running into an issue where CrateFile will keep a file handle open on Windows. The SdfLayer / SdfLayerRegistry are correctly cleaning up the layer but the file handle stays open until the calling process is killed. The issue only occurs when _CrateFile::_MmapAsset is used and can be mitigated by setting the environment variable USDC_USE_PREAD=1 (or using an ASCII .usda file). The offending code seems to start at CrateFile::Open.

Steps to Reproduce

  1. Download and extract scene.zip
  2. Run the following Python:
from pxr import Sdf
import os

# Change source_url to the extracted scene.usd file
source = "scene.usd"

source_layer = Sdf.Layer.FindOrOpen(source)
target_layer = Sdf.Layer.CreateAnonymous()

source_path = Sdf.Path("/Environment")
target_path = Sdf.Path("/Environment")

Sdf.CreatePrimInLayer(target_layer, target_path)
Sdf.CopySpec(source_layer, source_path, target_layer, target_path)

source_layer = None
os.remove(source)
  1. Or the following C++:
// Change source to the extracted scene.usd file
std::string source = "scene.usd";

SdfLayerRefPtr sourceLayer = SdfLayer::FindOrOpen(source);
SdfPath sourcePath("/Environment");

SdfLayerRefPtr destLayer = SdfLayer::CreateAnonymous();
SdfPath destPath("/Environment");

SdfCreatePrimInLayer(destLayer, destPath);
TF_AXIOM(SdfCopySpec(sourceLayer, sourcePath, destLayer, destPath));

sourceLayer.Reset();
TF_AXIOM(TfDeleteFile(source));

System Information (OS, Hardware)

OS: Windows 10 Enterprise
CPU: AMD Ryzen 9 5950X 16-Core Processor
GPU: NVIDIA GeForce RTX 3080

Package Versions

  • Tested against dev
@gitamohr
Copy link
Contributor

I suspect what's happening is that there are zero-copy arrays that are being held in the copied-to layer, so the underlying mapping is persisted beyond the lifetime of the usdc layer. When this happens we touch all the pages that the arrays refer to so they become dissociated from the file, but yeah Windows has its weird file locking thing so I imagine that so long as the mapping exists we'll have this issue.

Can you please test to see if running with USDC_ENABLE_ZERO_COPY_ARRAYS=0 works around the problem to confirm this hypothesis? (I don't have a Windows setup handy.)

@nvidia-jomiller
Copy link
Contributor Author

Yep, setting USDC_ENABLE_ZERO_COPY_ARRAYS=0 works around the problem so it looks like your hypothesis is correct

@gitamohr
Copy link
Contributor

Cool -- reading thru the win32 docs, I wonder if this is just a matter of us getting the files opened with the READ|WRITE|DELETE sharing modes on Windows.

@jilliene
Copy link

Filed as internal issue #USD-7200

@nvidia-jomiller
Copy link
Contributor Author

@gitamohr I've been looking into this more and found the following to be partially true:

I suspect what's happening is that there are zero-copy arrays that are being held in the copied-to layer, so the underlying mapping is persisted beyond the lifetime of the usdc layer. When this happens we touch all the pages that the arrays refer to so they become dissociated from the file

As you mentioned, the mapping is definitely being persisted beyond the lifetime of the usdc layer (from the example above, that would be sourceLayer). What I'm not seeing is the arrays being detached or dissociated from that file. Specifically, when sourceLayer.Reset() is called the corresponding CrateFile destructor is invoked and it goes to cleanup the mapping. When intrusive_ptr_release is called to delete that mapping it has a _refCount of 2 instead of 1 preventing ~_FileMapping() from being invoked. This subsequently does not cleanup ArchMapFileReadWrite which would call UnmapViewOfFile releasing the file lock.

So I think the question is, what should happen in this scenario? I do find it a little odd that calling _mmapSrc.reset() does not delete the file mapping since the CreateFile is being destroyed. It makes sense why it's not deleting it (the file mapping is still being referenced by something else) but it's not clear when that mapping will, or should, be cleaned up.

I think we could get around it by opening the file in FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE sharing mode as you suggested but I think that will introduce other problems with memory mapped files (i.e another process writes to the file that is currently memory mapped).

Obviously, this sort of behavior is complex so any additional insight you could provide would be much appreciated.

@gitamohr
Copy link
Contributor

Hi @nvidia-jomiller -- I'm looking into this now and there are a couple of issues. First, I think there is a question about how we want to handle the file sharing modes on Windows. To support zero-copy arrays, we may need to persist the mapping beyond the lifetime of the layer if outstanding arrays that reference the mapped memory still exist when the layer is destroyed. The "detaching" we do is to ensure that the referenced pages are swap-backed so that any changes to the underlying file have no effect on the array data. And we considered it important (on Linux, our platform) that once an SdfLayer is destroyed, it is okay to delete/replace/overwrite the file. We already have a rule that a process must not write to a usdc file that's being read by another process. If that happens, then all bets are off.

So that makes me think that SHARE_READ|WRITE|DELETE is a good choice for Windows, since that sounds pretty similar to what we have on Linux. I also looked thru the Win32 docs to see if there was a way to modify the sharing bits after the fact. If we could do that then we could potentially open it up for write/delete at the time the layer is destroyed, but it looks like this is not possible.

There is also a bug where we are not detaching the pages from the file at the right time that I have a fix for.

Let me know if you have thoughts on the sharing bits, etc. Thanks!

@nvidia-jomiller
Copy link
Contributor Author

And we considered it important (on Linux, our platform) that once an SdfLayer is destroyed, it is okay to delete/replace/overwrite the file. We already have a rule that a process must not write to a usdc file that's being read by another process. If that happens, then all bets are off.

I think that is totally fair. If someone has workflows where writes can happen from another process at the same time as reading there is usually additional infrastructure required to support it. Things like Perforce, custom version control systems or even different SdfFileFormat plugins

So that makes me think that SHARE_READ|WRITE|DELETE is a good choice for Windows, since that sounds pretty similar to what we have on Linux. I also looked thru the Win32 docs to see if there was a way to modify the sharing bits after the fact. If we could do that then we could potentially open it up for write/delete at the time the layer is destroyed, but it looks like this is not possible.

I agree. I think opening the file with SHARE_READ|WRITE|DELETE is a fine solution. I think the biggest problem with this is the behavior being different on Linux vs Windows. As long as the behavior is the same and understood everything should be good.

There is also a bug where we are not detaching the pages from the file at the right time that I have a fix for.

Awesome!

Thanks!

pixar-oss pushed a commit that referenced this issue Nov 8, 2022
…the wrong

time.  The nature of the fix led to some refactoring, which makes up the bulk of this
change.  The upside is that the internal API related to mmap sources is now
closer to that for pread & generic asset sources.

Fixes #1766

(Internal change: 2254424)
seando-adsk pushed a commit to autodesk-forks/USD that referenced this issue Mar 1, 2024
…the wrong

time.  The nature of the fix led to some refactoring, which makes up the bulk of this
change.  The upside is that the internal API related to mmap sources is now
closer to that for pread & generic asset sources.

Fixes PixarAnimationStudios#1766

(Internal change: 2254424)

(cherry picked from commit 6f5b22e)
seando-adsk pushed a commit to autodesk-forks/USD that referenced this issue Mar 1, 2024
…the wrong

time.  The nature of the fix led to some refactoring, which makes up the bulk of this
change.  The upside is that the internal API related to mmap sources is now
closer to that for pread & generic asset sources.

Fixes PixarAnimationStudios#1766

(Internal change: 2254424)

(cherry picked from commit 6f5b22e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants