Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for app exit failure when not passing back RunStatus #480

Closed
skliper opened this issue Jan 15, 2020 · 11 comments · Fixed by #598
Closed

Potential for app exit failure when not passing back RunStatus #480

skliper opened this issue Jan 15, 2020 · 11 comments · Fixed by #598
Assignees
Labels
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Jan 15, 2020

Describe the bug
CFE_ES_ExitApp reports error when passed in ExitStatus is CFE_ES_RunStatus_APP_RUN.

Typical app pattern (see https://github.com/nasa/sample_app/blob/master/fsw/src/sample_app.c) is:

  while (CFE_ES_RunLoop(&Sample_AppData.RunStatus) == TRUE) {do stuff}
 
  CFE_ES_ExitApp(Sample_AppData.RunStatus);

But CFE_ES_RunLoop does not update RunStatus on internal request to stop:

if ( CFE_ES_Global.AppTable[AppID].ControlReq.AppControlRequest != CFE_ES_RunStatus_APP_RUN )
{
/*
** We have an external request to stop
*/
ReturnCode = false;
}

To Reproduce
Steps to reproduce the behavior:

  1. Send RestartApp, will error and fail to restart.

Expected behavior
Set passed in RunStatus to the control request for the case above:
RunStatus = CFE_ES_Global.AppTable[AppID].ControlReq.AppControlRequest

Allows the App to take appropriate action.

System observed on:

  • cFS Dev Server
  • OS: Ubuntu 18.04
  • Versions: current dev (6.7.4)

Additional context
Fails build verification testing

Reporter Info
Jacob Hageman - NASA/GSFC

@skliper skliper added the bug label Jan 15, 2020
@skliper skliper added this to the 6.8.0 milestone Jan 15, 2020
@jphickey
Copy link
Contributor

Is this a recent regression?

From a high level perspective, it seems architecturally kind of odd for ES to keep a status in its internal table, pass it back to an app, which passes it back to ES. ES already knows the status as it is the source of truth anyway. Why does the app need the value anyway?

@skliper
Copy link
Contributor Author

skliper commented Jan 16, 2020

Is this a recent regression?

It is an issue that was masked by another issue that was fixed in #375. So could have been introduced anytime since 6.6 (I haven't dug into the history).

From a high level perspective, it seems architecturally kind of odd for ES to keep a status in its internal table, pass it back to an app, which passes it back to ES. ES already knows the status as it is the source of truth anyway. Why does the app need the value anyway?

I'm guessing the design pattern is to allow the App to do something based on a request if it wants. Likely "special" cases, maybe a critical app that should never be restarted externally or something. Given the current pattern the app could handle the response and not exit if desired. Maybe @acudmore or @jwilmot have more history?

@jphickey
Copy link
Contributor

There was a different bug in this area - #89 - which addressed an endless loop when CFE_ES_ExitApp was used with unexpected codes. As part of this it restricted the exit status here to be either a normal exit or error exit.

IMO this function should already know that there is a restart request pending - as it is in the ES global state. In this case it shouldn't matter what value the app passes in. The bug is probably here:

CFE_ES_Global.AppTable[AppID].ControlReq.AppControlRequest = ExitStatus;

This is overwriting the existing request in the ES internal state, which is probably inadvertently resetting/overwriting the restart request in this case. We should probably only write to the AppControlRequest field if its current value is CFE_ES_RunStatus_APP_RUN, and do not overwrite other values that could be here.

A simple if check around this assignment might fix this.

@skliper
Copy link
Contributor Author

skliper commented Jan 27, 2020

From @acudmore:

I’m trying to recall the logic of this, but I may have started a little too late before a 3-day weekend. I’ll have to think about it.. but my current thoughts:

Looking at the AppExit code, I think my intention was to allow the app to request an exit through the parameter that it passes in, but maybe a better separation needs to be done with what it is requesting vs the internal app state.

I am having a hard time thinking of why we need to:

  • pass the status from ES in the run loop call back to the app
  • pass the status from the app to the AppExit call
  • have the AppExit call replace the app’s status with what is passed in.

There are a couple of cases for calling the AppExit that I can think of right now:

  • The app just falls out of its run loop because it was requested to exit or had an error of some sort – I was told by the system to exit
  • The app decides to break out of its run loop and exit – I decided on my own to exit

In the first case, the parameter is not needed, the system already knows why it told the app to exit.
In the second case, the app may wish to inform the system why it decided to exit. In this case it acts more like a parameter, and not the global status.

Is the parameter to AppExit needed at all? If the app is running fine, but has some sort of internal error, or just wants to quit, it may want to indicate why it’s exiting. Otherwise, it might not be needed.

Does that make sense?

@jphickey
Copy link
Contributor

Based on my previous investigation, the CFE_ES_ExitApp call is relevant to distinguish between a voluntary exit that was normal, versus an error of some sort. For all other exit requests, where it is based on a ES command, then ES already knows why the app should be exiting. In particular the app might get an error and decide to exit before it ever reaches the Run Loop part.

My recommendation would be to simplify things:

  1. deprecate the in/out status parameter on CFE_ES_RunLoop(). It is already (more or less) a don't care value.
  2. Document that when the app exits itself, it should call CFE_ES_AppExit() with either the CFE_ES_RunStatus_APP_EXIT or CFE_ES_RunStatus_APP_ERROR codes to indicate if it is exiting in a normal state or errored state, respectively.

For any other status then ES would already know about it. We just need to make sure the request isn't overwritten (which I think is the cause of this issue).

@skliper
Copy link
Contributor Author

skliper commented Jan 27, 2020

Sounds good to me. @acudmore?

@jphickey
Copy link
Contributor

There may even be a possibility of further simplification. After another look, even with the distinct exit status (APP_EXIT vs. APP_ERROR) there is not a substantial difference in how its handled during the clean up phase.

The only difference is that it generates a different event (CFE_ES_EXIT_APP_INF_EID vs. CFE_ES_ERREXIT_APP_INF_EID) but otherwise the clean up is identical. Depending on whether this different EID value is useful to anyone, it might be possible to deprecate the CFE_ES_ExitApp() function entirely and simply have a single app exit event.

Note that in the vast majority of FSW applications they run forever, so the only reason they would exit is either from an error or from being commanded to do so. I'm not sure of any real FSW purpose for a non-error, voluntary APP_EXIT status.

@skliper
Copy link
Contributor Author

skliper commented Jan 28, 2020

Maybe poll the community for any unique use cases? It all sounds good to me as long as people aren't dependent on the behavior. @ejtimmon - any GSFC special use of the exit codes in apps?

@skliper skliper added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Mar 18, 2020
@skliper
Copy link
Contributor Author

skliper commented Mar 18, 2020

Let's discuss this one quickly, also ping @ejtimmon for any concerns w/ suggestion.

@astrogeco
Copy link
Contributor

CCB 2020-03-25: Discussed, no concerns with approach. @ejtimmon will double check GSFC apps to see if functionality is needed.

@ghost
Copy link

ghost commented Mar 25, 2020

My recommendation would be to simplify things:

  1. deprecate the in/out status parameter on CFE_ES_RunLoop(). It is already (more or less) a don't care value.
  2. Document that when the app exits itself, it should call CFE_ES_AppExit() with either the CFE_ES_RunStatus_APP_EXIT or CFE_ES_RunStatus_APP_ERROR codes to indicate if it is exiting in a normal state or errored state, respectively.

For any other status then ES would already know about it. We just need to make sure the request isn't overwritten (which I think is the cause of this issue).

I agree with this, it seems like a reasonable fix with minimal impact.

There may even be a possibility of further simplification. After another look, even with the distinct exit status (APP_EXIT vs. APP_ERROR) there is not a substantial difference in how its handled during the clean up phase.

The only difference is that it generates a different event (CFE_ES_EXIT_APP_INF_EID vs. CFE_ES_ERREXIT_APP_INF_EID) but otherwise the clean up is identical. Depending on whether this different EID value is useful to anyone, it might be possible to deprecate the CFE_ES_ExitApp() function entirely and simply have a single app exit event.

Note that in the vast majority of FSW applications they run forever, so the only reason they would exit is either from an error or from being commanded to do so. I'm not sure of any real FSW purpose for a non-error, voluntary APP_EXIT status.

I'm not sure I understand this. Are you talking about deprecating the CFE_ES_ExitApp function and just having the RunLoop call handle the app exit? If that is the case, what if an app has a resource open that the OSAL does not know about, and cannot clean up before exiting the app? In that case the CFE_ES_ExitApp function may still have some utility.

@astrogeco astrogeco removed the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Mar 27, 2020
jphickey added a commit to jphickey/cFE that referenced this issue Apr 3, 2020
jphickey added a commit to jphickey/cFE that referenced this issue Apr 9, 2020
Create a new background job to handle the maintenance tasks
that had been performed in the ES main task as part of the
CFE_ES_ScanAppTable() routine.

All app state changes, including those invoked by messages,
are now handled by this job.

This also slightly changes the semantics of CFE_ES_RunLoop and
CFE_ES_ExitApp.  Now, the CFE_ES_RunLoop routine no longer requires
a RunStatus buffer.  Instead, the only thing that matters is the
RunStatus value that is eventually passed to CFE_ES_ExitApp after
the shutdown is complete.  This should be mostly backward compatible,
as the recommended app pattern would pass the same value to
both functions.

This commit also fixes nasa#480, as the value passed to CFE_ES_ExitApp
will not override a request that was already pending.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants