Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DeviceModule::writeAfterOpen #144

Closed
5 tasks done
killenb opened this issue Apr 7, 2020 · 6 comments · Fixed by #149
Closed
5 tasks done

Remove DeviceModule::writeAfterOpen #144

killenb opened this issue Apr 7, 2020 · 6 comments · Fixed by #149
Assignees

Comments

@killenb
Copy link
Member

killenb commented Apr 7, 2020

Child of #123

Currently constants have a special treatment that they are written using writeAfterOpen(), while all other variables use the "recoveryAccessor" for it. In fact the list in "writeAfterOpen" is only written for the first open, which is wrong. Constant values are not recovered in the current implementation.

There are only four places where the list writeAfterOpen is used:

  • Remove the instance from DeviceModule.h
  • In DeviceModule.cc : iterate the list writeRecoveryOpen instead, but only write if the version number is not nullptr.
  • In ApplicationModule.cc: Just add the impl variable with addRecoveryAccessor, instead of adding it to writeAfterOpen
  • In ExceptionHandlingDecorator.cc: Just remove the line where the recovery accessor is added to writeAfterOpen. It already is registered with the device module.

Solve #133: Recovery accessors should only be written if version number is not nullptr. It is the same code as you just put into DeviceModule.cc. Just copy it. The duplication will be cleaned up in #129

  • Write a test that constants are written correctly (probably exists) AND are correctly recovered (does not exist yet because it is not working at the moment and nobody noticed so far)

  • Do not write tests that all recovery accessors are now only written if they have a valid version number. This is a separate ticket (Write tests that only valid recovery accessors are written to the device #145)

@phako
Copy link
Member

phako commented Apr 8, 2020

The required test should already be done in testExceptionHandler/testConstants - but after the changes it deadlocks in that test

@phako
Copy link
Member

phako commented Apr 8, 2020

This seems to be around the RecoverySharedLock().

Trace from helgrind exit status:

==1977== Thread #3: Exiting thread still holds 2 locks
==1977==    at 0xC7079F3: futex_wait_cancelable (futex-internal.h:88)
==1977==    by 0xC7079F3: __pthread_cond_wait_common (pthread_cond_wait.c:502)
==1977==    by 0xC7079F3: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:655)
==1977==    by 0x4C36DC3: pthread_cond_wait_WRK (hg_intercepts.c:1203)
==1977==    by 0x6A5D86: boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) (condition_variable.hpp:81)
==1977==    by 0x6244F50: boost::shared_mutex::lock_shared() (shared_mutex.hpp:191)
==1977==    by 0x624A574: boost::shared_lock<boost::shared_mutex>::lock() (lock_types.hpp:645)
==1977==    by 0x624747A: boost::shared_lock<boost::shared_mutex>::shared_lock(boost::shared_mutex&) (lock_types.hpp:520)
==1977==    by 0x6243F5D: ChimeraTK::DeviceModule::getRecoverySharedLock() (DeviceModule.cc:490)
==1977==    by 0x62BAEFF: ChimeraTK::ExceptionHandlingDecorator<int>::doPreWrite() (ExceptionHandlingDecorator.cc:140)
==1977==    by 0x6AF449: ChimeraTK::TransferElement::preWrite() (TransferElement.h:377)
==1977==    by 0x6AF170: ChimeraTK::TransferElement::write(ChimeraTK::VersionNumber) (TransferElement.h:219)
==1977==    by 0x6242142: ChimeraTK::DeviceModule::handleException() (DeviceModule.cc:384)
==1977==    by 0x625358A: boost::_mfi::mf0<void, ChimeraTK::DeviceModule>::operator()(ChimeraTK::DeviceModule*) const (mem_fn_template.hpp:49)
==1977== 
==1977== 

@phako
Copy link
Member

phako commented Apr 8, 2020

Ah. Because we are trying to call write() on the accessor while holding the unique lock on the recoverySharedMutex, which then calles the preWrite() on the ExceptionHandlingDecoratorAccessor which in its preWrite tries to get the sharedLock on the same mutex. Since it never gets that, The write will never success.

@killenb
Copy link
Member Author

killenb commented Apr 8, 2020

Yes, that was a bug which was always there. A fully decorated Accessor was added to writeAfterOpen. But this was only called at the first open, where the recovery accessor was still waiting for the opening of the device. This is why it never locked.

Now it is in the list of recovery accessors, which must not be decorated with the ExceptionHandlingDecorator. This causes the deadlock.

Actually, the required test is already there, and the functionality always worked, because in Application.cc was calling createDeviceVariable, which creates a RecoveryAccessor. This accessor get the value when the write on the "impl" is executed in the inital open. I think it somewhat worked by accident.

I will fix the code, I think I have the solution.

@killenb
Copy link
Member Author

killenb commented Apr 8, 2020

Now testTrigger fails with a race condition. The timing must have changed slightly.
Also testExceptionHandling sometimes fails with a race condition in testShutdown.

@killenb
Copy link
Member Author

killenb commented Apr 9, 2020

I created two bug tickets #146 and #147. #147 is required before this ticket can be merged to the master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants