Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UsdSkel _GetJointWorldInverseBindTransforms has a race, and may compute the transforms multiple times #1742

Open
williamkrick opened this issue Jan 14, 2022 · 11 comments

Comments

@williamkrick
Copy link
Contributor

Description of Issue

_GetJointWorldInverseBindTransforms tries to limit itself to computing the inverse bind transforms once by testing to see if the ComputeFlag has been set in UsdSkel_SkelDefinition::_flags before performing the calculation. However, if this code is executed in parallel multiple threads could read the value of _flags before the computation finishes and ComputeFlag is set. This could cause multiple threads to enter _ComputeJointWorldInverseBindTransforms and do the computation.

There is a lock in _ComputeJointWorldInverseBindTransforms which prevents anything really bad from happening (as far as I can see, more on that later) but we do waste a little bit of time on load re-doing this calculation.

How did I discover this? I'm working on CPU USDSkel for MayaUSD, and I occasionally crash on file load near here. Specifically, the calling code is UsdSkelSkeletonQuery::_ComputeSkinningTransforms, and I crash at line 378 in the inline dtor for local variable inverseBindXforms. The dtor has the ref count for the VtArray going to zero and crashing trying to delete the underlying control block. However, tracing the code tells me that the reference count should never be going to zero here, because the data underlying inverseBindTransforms should always be held by the UsdSkel_SkelDefinition in _jointWorldInverseBindXforms. The key accident that keeps it safe is when the transforms are re-computed the resize call discovers the array is already the correct size and does nothing. This prevents the storage from changing and breaking other threads that are already accessing _jointWorldInverseBindXforms through their own local VtArray.

Clearly I'm missing something, so I'm hoping y'all can take a look and tell me if you see a race that could cause the crash.

I tried reproducing this in USDView and I couldn't reproduce the crash there. I didn't try investigating to see if multiple threads could re-do _ComputeJointWorldInverseBindTransforms, but I think it probably can happen there too.

Sorry for the relative vagueness of this issue, I'd prefer to pin down exactly what the race is but I'm stuck trying to figure out how the crash could occur and I don't see it.

Steps to Reproduce

  1. Sync and build my branch of MayaUSD: Autodesk/maya-usd@abd3eda
  2. Load a scene with USDSkel where multiple mesh rprims are bound to the same skeleton. You'll probably have to re-load a number of times to reproduce the crash. Depending on the scene it may happen once in ten tries or less.

System Information (OS, Hardware)

Windows

Package Versions

USD 21.11, the MayaUSD branch I linked above

Build Flags

@spiffmon
Copy link
Member

Hi @williamkrick , sorry you're hitting this. I just wanted to be honest and let you know it's unlikely we'll be able to look into this anytime soon; our UsdSkel expertise is not high, currently, and the team that ostensibly supports it is small and Presto-based, not Maya-based.

@frankzhang11
Copy link

@spiffmon I've met the same problem @williamkrick met. Basically the lazy computation of _GetJointWorldInverseBindTransforms () makes this every function calling it not thread same.

@williamkrick we had a workaround in our project, just precalculate these invBindingTransforms somewhere when the skeleton is loaded (to avoid this lazy computation).

@williamkrick
Copy link
Contributor Author

@spiffmon no worries, I just wish I had a solid answer on what the race is.

@frankzhang11 thanks for the idea! I'll add that and finger crossed it'll fix my crash.

@cameronwhite
Copy link
Contributor

I don't think I've hit this crash in my use of UsdSkel, but here's my guess at a possible cause:

  • After the first thread finishes _ComputeJointWorldInverseBindTransforms(), any threads entering UsdSkel_SkelDefinition::_GetJointWorldInverseBindTransforms() will grab the computed result from _jointWorldInverseBindXforms. Since it's a VtArray, this will bump the ref count and share the underlying data

  • Any threads that were still waiting on the mutex in _ComputeJointWorldInverseBindTransforms() will eventually enter the critical section and redo the computation. As one of the earlier comments pointed out, the _jointWorldInverseBindXforms array is already the right size and doesn't need to be reallocated. However, if some other thread(s) have already made a copy in the meantime (described above), then writing to _jointWorldInverseBindXforms would trigger a _DetachIfNotUnique()

I suspect there could be an issue if a VtArray instance is in the middle of _DetachIfNotUnique() while another thread is attempting to copy it. e.g:

  • thread A has a local copy of the array, which shares the data (refcount == 2)
  • thread B tries to write to the primary instance, entering the body of _DetachIfNotUnique() since _IsUnique() is false
  • thread C starts to copy the primary instance, but hasn't bumped the refcount yet
  • thread A finishes and decrements the refcount (refcount == 1)
  • thread B decrements the refcount, taking it to zero and destroying the data, before switching to its new copy of the data
  • thread C now has an array with a reference to the deleted data block

@jilliene
Copy link

Filed as internal issue #USD-7143

@williamkrick
Copy link
Contributor Author

Wow @cameronwhite thank you! I had forgotten that VtArray has copy-on-write semantics. The order of events you've laid out here seems to plausibly cause a crash. I will try to use the debugger to force this order of events to occur and see if I crash.

@frankzhang11
Copy link

VtArray write is not thread safe

@spitzak
Copy link

spitzak commented Jan 20, 2022

Is it a bug that it is not thread safe? It seems like a thread-safe version could be made (assuming each thread has it's own refcount pointer to the array). If two threads tried to write at the same time it would make two copies and then throw away the original after the copies were made.

@spiffmon
Copy link
Member

spiffmon commented Jan 20, 2022 via email

@williamkrick
Copy link
Contributor Author

On my side I've found that I can force the issue to occur using the debugger and tightly controlling the timing of the various threads. I don't think that's surprising given the technical discussion going on here but it's nice to have practical backup to our ideas.

I also figured out that this race can't happen in UsdView. HdStExtComputation::Sync() calls GetExtComputationInput on each input, including the joint world inverse bind transforms. The compute is an sprim so this sync occurs serially and avoids the race.

I can work around the issue by implementing my own class derived from HdExtComputation which also calls GetExtComputationInput. I can't re-use HdStExtComputation because it does Hydra GL specific things like allocating buffer array ranges. With this change the crash doesn't seem to occur.

@cameronwhite
Copy link
Contributor

In this case I'd propose that _ComputeJointWorldInverseBindTransforms() should re-check the ComputeFlag after entering the critical section and skip if another thread already performed the computation (or use a similar pattern like std::call_once)

Aside from avoiding redundant work, having threads calling non-const methods on the cached value while other threads may be in the middle of copying it seems like a bad idea in general..

cameronwhite added a commit to sideeffects/USD that referenced this issue Mar 28, 2023
- In methods like _ComputeJointWorldInverseBindTransforms(), check the
  compute flag again after acquiring the lock to avoid potentially
  recomputing the result again if multiple threads were waiting on the
  mutex. Although the computed result would not change, it is
  not safe to call mutable member functions of the VtArray (which can
  cause a copy-on-write detach) while other threads may be in the middle
  of making a copy of it.

- Prefer using operator|= to atomically set the flag rather than
  doing a read -> bitwise OR -> atomic store sequence which could cause
  flags to be lost if there are concurrent writes.
  Currently the writes are all guarded by the same mutex so the previous
  approach was not problematic, but the new approach is safer if e.g. in the
  future there are separate locks for each cached array.

Bug: PixarAnimationStudios#1742
cameronwhite added a commit to sideeffects/USD that referenced this issue Mar 30, 2023
- In methods like _ComputeJointWorldInverseBindTransforms(), check the
  compute flag again after acquiring the lock to avoid potentially
  recomputing the result again if multiple threads were waiting on the
  mutex. Although the computed result would not change, it is
  not safe to call mutable member functions of the VtArray (which can
  cause a copy-on-write detach) while other threads may be in the middle
  of making a copy of it.

- Prefer using operator|= to atomically set the flag rather than
  doing a read -> bitwise OR -> atomic store sequence which could cause
  flags to be lost if there are concurrent writes.
  Currently the writes are all guarded by the same mutex so the previous
  approach was not problematic, but the new approach is safer if e.g. in the
  future there are separate locks for each cached array.

Bug: PixarAnimationStudios#1742
marktucker pushed a commit to sideeffects/USD that referenced this issue Apr 26, 2023
- In methods like _ComputeJointWorldInverseBindTransforms(), check the
  compute flag again after acquiring the lock to avoid potentially
  recomputing the result again if multiple threads were waiting on the
  mutex. Although the computed result would not change, it is
  not safe to call mutable member functions of the VtArray (which can
  cause a copy-on-write detach) while other threads may be in the middle
  of making a copy of it.

- Prefer using operator|= to atomically set the flag rather than
  doing a read -> bitwise OR -> atomic store sequence which could cause
  flags to be lost if there are concurrent writes.
  Currently the writes are all guarded by the same mutex so the previous
  approach was not problematic, but the new approach is safer if e.g. in the
  future there are separate locks for each cached array.

Bug: PixarAnimationStudios#1742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants