-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in thread safety after #1462 #1468
Comments
I wonder if maybe the signals aren't being cloned correctly. When a skeleton is cloned, all of its connections internally and externally are supposed to be completely independent of the original. If events that are happening in parallel threads are affecting each other because of the signals, then that suggests to me that the signals are being shallow copied instead of deep copied when a cloning happens. One possible workaround you could try if you don't want to be blocked by this bug is to completely set up all of your simulation worlds before the parallel loops begin and keep all of them alive until all the threads are joined back together. I think that should prevent this bug from negatively impacting you until it can be fixed. |
@mxgrey Thanks for your reply. I share your intuition, as it is clear from the problem that I raised last week (#1460) that the signals are not independent between the clones and the original. i.e., making millions of clones makes the original to have millions of (mainly disconnected) connections. I thought this was made on purpose (even if I did not understand why), and thought that the most important thing was to clear the closed connections. I am using tens of millions of cloned robots, so instantiating them all at the same time would not be feasible. An alternative would be to instantiate just enough clones for the ones that will concurrently used, but that won't be trivial to implement. |
If all the clones are identical and don't need to be modified, then I think a "clone pool" would be a good solution. Just have something like a |
I am using a clone pool in all of my tests. This is working quite robustly and I do not have any big leaks. This is not very difficult to do. @Aneoshun I can share the code if you wish.
Indeed this is faster than re-cloning every-time. I can confirm this in my tests. But of course, this specific bug should be addressed. |
Yes, I think this bug should be addressed, mainly to avoid future problems. @costashatz your clone pool looks great. Could you please share your code? (you can send it by email if you prefer). |
Here's one way of doing the pool: class MyClonePool {
public:
static MyClonePool* instance()
{
static MyClonePool gdata;
return &gdata;
}
MyClonePool(const MyClonePool&) = delete;
void operator=(const MyClonePool&) = delete;
dart::dynamics::SkeletonPtr get_skeleton()
{
std::lock_guard<std::mutex> lock(_skeleton_mutex);
for (size_t i = 0; i < _num_skeletons; i++) {
if (_free[i]) {
_free[i] = false;
return _skeletons[i];
}
}
return nullptr;
}
void free_skeleton(const dart::dynamics::SkeletonPtr& skel)
{
std::lock_guard<std::mutex> lock(_skeleton_mutex);
for (size_t i = 0; i < _num_skeletons; i++) {
if (_skeletons[i] == skel) {
_set_init_pos(skel);
_free[i] = true;
break;
}
}
}
protected:
std::vector<dart::dynamics::SkeletonPtr> _skeletons;
std::vector<bool> _free;
size_t _num_skeletons = 10; // set this number to your maximum number of concurrent skeletons (or slightly bigger to be sure! ;))
std::mutex _skeleton_mutex;
MyClonePool() {
// load one skeleton from file and clone it _num_skeletons times
// OR load _num_skeletons different skeletons from file
// do not forget to fill _skeletons with your skeletons
// Initialize all skeletons to free
for (size_t i = 0; i < _num_skeletons; i++) {
_free.push_back(true);
}
}
void _set_init_pos(const dart::dynamics::SkeletonPtr& skel)
{
skel->resetPositions();
skel->resetVelocities();
skel->resetAccelerations();
skel->clearExternalForces();
// reset any altered properties (e.g., altered mass of a link) and set initial positions
}
}; Then you use it like this: void evaluation_function() {
// Acquire a freed skeleton
dart::dynamics::SkeletonPtr skel = nullptr;
while (skel == nullptr) skel = MyClonePool::instance()->get_skeleton();
// do your stuff with skel
// Free your skeleton so that a new function can take it
MyClonePool::instance()->free_skeleton(skel);
} If you do not like singletons, you can define a global namespace and put your I hope this is helpful. |
Thanks a lot. I will use this while the clone function of the skeleton get fixed. |
Hi,
It seems that #1462 makes dart less thread-safe.
I am currently unable to provide a lot of details at the moment (lack of time), but I thought I had to report this asap given that it has been merged to master recently.
Following the use case that I mentioned in #1460
I clone a lot of skeletons and then run simulations in parallel (in short, the for loop is replaced by a parallel_for from tbb).
This has been working for years and is still working with commit (3 commits ago) becbead
But using the commit associated with #1462 leads to systematic segfaults happening at a random point in time.
Would you have an idea of what in the #1462 fix could be causing this? Otherwise, I will investigate this as soon as I have some time.
The text was updated successfully, but these errors were encountered: