Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models that contain Spawn cannot be reset #2956

Closed
kbenne opened this issue Apr 11, 2022 · 5 comments · Fixed by #2984
Closed

Models that contain Spawn cannot be reset #2956

kbenne opened this issue Apr 11, 2022 · 5 comments · Fixed by #2984
Assignees
Labels
spawn Development for Spawn of EnergyPlus
Milestone

Comments

@kbenne
Copy link
Contributor

kbenne commented Apr 11, 2022

We discovered this in the context of BOPTEST where it is common to instantiate a FMU, run a simulation, reset the FMU, and then run another simulation. If you do this with a model that contains Spawn, there will be a reliable segfault.

In BOPTEST we use PyFMI, but I will post a unit test that uses plain fmi2 API calls to demonstrate this issue. The key sequence of events is to call reset and then setupExperiment.

I actually already have a fix for this, but I want to file an issue to keep a record of what is going on.

@kbenne kbenne added the spawn Development for Spawn of EnergyPlus label Apr 11, 2022
@kbenne kbenne self-assigned this Apr 11, 2022
@kbenne
Copy link
Contributor Author

kbenne commented Apr 11, 2022

Refer to the linked commit which contains a test about this.

Consider the sequence of events.

  1. instantiate (The first time) - The main FMU shared library is loaded. The spawn related data structures are allocated (a external C function constructor function is called), but the EnergyPlus FMU is not yet created.
  2. setupExperiment is called - Spawn's EnergyPlus FMU is generated and the FMU's EnergyPlus related shared library is loaded.
  3. ... a simulation is performed as usual.
  4. terminate - No problem. EnergyPlus shutdown.
  5. reset - The destructor external C function related to Spawn objects is called and the EnergyPlus FMU and corresponding DLL is unloaded (e.g. dlclose). Note that the main (outer) FMU library is not unloaded.
  6. setupExperiment is called again - Boom. Program segfaults. Spawn exe is called again to generate a new EnergyPlus FMU. That goes fine. But then the program tries to load the newly generated FMU and the program crashes.

After an excessive amount of debugging, I finally know why. I'll describe the change in the next commit. (I thought this issue might beat me.)

kbenne added a commit to NREL/Spawn that referenced this issue Apr 11, 2022
Each generated EnergyPlus FMU is now given a unique id.

ref lbl-srg/modelica-buildings#2956
@kbenne
Copy link
Contributor Author

kbenne commented Apr 11, 2022

Refer to the fix in NREL/Spawn@a8ff1e5.

Now every EnergyPlus FMU shared library that is generated is given a unique filename. What I found is that dlopen (and related dlsym) which are used behind the scenes, get confused by the previously described sequence of events.

  1. load a dll
  2. close a dll
  3. remove that dll and generate another one in the exact same file location
  4. load again

Linux thinks it is the same library in step 4, and seems to try to take a shortcut and not actually load the new library. As soon as you try to load symbols out of the library (dlsym) the program crashes. My fix is to give each EnergyPlus dll that is generated a unique name, and the dlopen will perform a clean reload.

Note that the conventional FMU workflow does not suffer this problem, because generally the FMU is not closed (dlclose) even after reset is called. Spawn's inner EnergyPlus FMU is used in somewhat of an unconventional way and so this problem appears.

kbenne added a commit to ibpsa/project1-boptest that referenced this issue Apr 11, 2022
Spawn-0.3.0-391771b478 addresses an issue resetting the simulation.
lbl-srg/modelica-buildings#2956
@mwetter
Copy link
Member

mwetter commented Apr 13, 2022

@kbenne : Not sure if this is for now just a FYI or do you want me to take any actions such as integrating new binaries.

@kbenne
Copy link
Contributor Author

kbenne commented Apr 13, 2022

At this point it is mostly an FYI. If you are comfortable with my approach then I suggest we move forward. There could be alternative solutions such as avoiding the dll regenerate/reload from the MBL side, but I think what I have proposed is the most robust.

I do hope this change is included in the MBL 9.0 release. I am working on the other GitHub issues that are tagged 9.0 and will provide new installers as soon as possible.

@mwetter mwetter added this to the Release 9.0 milestone Apr 13, 2022
@mwetter
Copy link
Member

mwetter commented Apr 13, 2022

@kbenne I think that fix is fine. I will wait until you have the new binaries (no rush as I have other issues to go through).

kbenne added a commit to NREL/Spawn that referenced this issue Apr 14, 2022
A recent change to make the epfmi library unique caused the possiblity
of the library name starting with a numeric character, which triggers a
warning.

This change is to prefix the library name with "epfmi_"

ref lbl-srg/modelica-buildings#2956
ref lbl-srg/modelica-buildings#2220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spawn Development for Spawn of EnergyPlus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants