-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential for app exit failure when not passing back RunStatus #480
Comments
Is this a recent regression? From a high level perspective, it seems architecturally kind of odd for ES to keep a status in its internal table, pass it back to an app, which passes it back to ES. ES already knows the status as it is the source of truth anyway. Why does the app need the value anyway? |
It is an issue that was masked by another issue that was fixed in #375. So could have been introduced anytime since 6.6 (I haven't dug into the history).
I'm guessing the design pattern is to allow the App to do something based on a request if it wants. Likely "special" cases, maybe a critical app that should never be restarted externally or something. Given the current pattern the app could handle the response and not exit if desired. Maybe @acudmore or @jwilmot have more history? |
There was a different bug in this area - #89 - which addressed an endless loop when CFE_ES_ExitApp was used with unexpected codes. As part of this it restricted the exit status here to be either a normal exit or error exit. IMO this function should already know that there is a restart request pending - as it is in the ES global state. In this case it shouldn't matter what value the app passes in. The bug is probably here: cFE/fsw/cfe-core/src/es/cfe_es_api.c Line 404 in 7251cbc
This is overwriting the existing request in the ES internal state, which is probably inadvertently resetting/overwriting the restart request in this case. We should probably only write to the A simple if check around this assignment might fix this. |
From @acudmore: I’m trying to recall the logic of this, but I may have started a little too late before a 3-day weekend. I’ll have to think about it.. but my current thoughts: Looking at the AppExit code, I think my intention was to allow the app to request an exit through the parameter that it passes in, but maybe a better separation needs to be done with what it is requesting vs the internal app state. I am having a hard time thinking of why we need to:
There are a couple of cases for calling the AppExit that I can think of right now:
In the first case, the parameter is not needed, the system already knows why it told the app to exit. Is the parameter to AppExit needed at all? If the app is running fine, but has some sort of internal error, or just wants to quit, it may want to indicate why it’s exiting. Otherwise, it might not be needed. Does that make sense? |
Based on my previous investigation, the My recommendation would be to simplify things:
For any other status then ES would already know about it. We just need to make sure the request isn't overwritten (which I think is the cause of this issue). |
Sounds good to me. @acudmore? |
There may even be a possibility of further simplification. After another look, even with the distinct exit status (APP_EXIT vs. APP_ERROR) there is not a substantial difference in how its handled during the clean up phase. The only difference is that it generates a different event (CFE_ES_EXIT_APP_INF_EID vs. CFE_ES_ERREXIT_APP_INF_EID) but otherwise the clean up is identical. Depending on whether this different EID value is useful to anyone, it might be possible to deprecate the CFE_ES_ExitApp() function entirely and simply have a single app exit event. Note that in the vast majority of FSW applications they run forever, so the only reason they would exit is either from an error or from being commanded to do so. I'm not sure of any real FSW purpose for a non-error, voluntary APP_EXIT status. |
Maybe poll the community for any unique use cases? It all sounds good to me as long as people aren't dependent on the behavior. @ejtimmon - any GSFC special use of the exit codes in apps? |
Let's discuss this one quickly, also ping @ejtimmon for any concerns w/ suggestion. |
CCB 2020-03-25: Discussed, no concerns with approach. @ejtimmon will double check GSFC apps to see if functionality is needed. |
I agree with this, it seems like a reasonable fix with minimal impact.
I'm not sure I understand this. Are you talking about deprecating the CFE_ES_ExitApp function and just having the RunLoop call handle the app exit? If that is the case, what if an app has a resource open that the OSAL does not know about, and cannot clean up before exiting the app? In that case the CFE_ES_ExitApp function may still have some utility. |
Move scan to a background job
Create a new background job to handle the maintenance tasks that had been performed in the ES main task as part of the CFE_ES_ScanAppTable() routine. All app state changes, including those invoked by messages, are now handled by this job. This also slightly changes the semantics of CFE_ES_RunLoop and CFE_ES_ExitApp. Now, the CFE_ES_RunLoop routine no longer requires a RunStatus buffer. Instead, the only thing that matters is the RunStatus value that is eventually passed to CFE_ES_ExitApp after the shutdown is complete. This should be mostly backward compatible, as the recommended app pattern would pass the same value to both functions. This commit also fixes nasa#480, as the value passed to CFE_ES_ExitApp will not override a request that was already pending.
Describe the bug
CFE_ES_ExitApp reports error when passed in ExitStatus is CFE_ES_RunStatus_APP_RUN.
Typical app pattern (see https://github.com/nasa/sample_app/blob/master/fsw/src/sample_app.c) is:
But CFE_ES_RunLoop does not update RunStatus on internal request to stop:
cFE/fsw/cfe-core/src/es/cfe_es_api.c
Lines 545 to 552 in 7251cbc
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Set passed in RunStatus to the control request for the case above:
RunStatus = CFE_ES_Global.AppTable[AppID].ControlReq.AppControlRequest
Allows the App to take appropriate action.
System observed on:
Additional context
Fails build verification testing
Reporter Info
Jacob Hageman - NASA/GSFC
The text was updated successfully, but these errors were encountered: