Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rest: optimise include_progress functionality #486

Open
audrium opened this issue Nov 24, 2022 · 2 comments
Open

rest: optimise include_progress functionality #486

audrium opened this issue Nov 24, 2022 · 2 comments
Assignees

Comments

@audrium
Copy link
Member

audrium commented Nov 24, 2022

Right now we have include_progress flag in get_workflows endpoint and also use it by default in get_workflow_status endpoint. While returning workflow progress data we are using mainly a job_progress JSON field of Workflow DB table. In addition to that we also query for the most recent Job command and step name.

This is an example of how progress field of get_workflows response looks like for bsm workflow:

{
        "current_command": "source /usr/local/bin/thisroot.sh\nvariations=$(echo nominal weight_var1_up weight_var1_dn|sed 's| |,|g')\npython /code/histogram.py /var/reana/users/00000000-0000-0000-0000-000000000000/workflows/cbe7f350-17f0-45ce-a173-90ad2fcdec80/all_bkg_mc/run_mc_1/select_signal_merge_0/merged.root /var/reana/users/00000000-0000-0000-0000-000000000000/workflows/cbe7f350-17f0-45ce-a173-90ad2fcdec80/all_bkg_mc/run_mc_1/select_signal_hist_0/hist.root mc2 0.0125 $variations\n", 
        "current_step_name": "select_signal_hist_0", 
        "failed": {
          "job_ids": [], 
          "total": 0
        }, 
        "finished": {
          "job_ids": [...], 
          "total": 44
        }, 
        "run_finished_at": null, 
        "run_started_at": "2022-11-24T09:41:45", 
        "running": {
          "job_ids": [], 
          "total": 0
        }, 
        "total": {
          "job_ids": [], 
          "total": 65
        }
      }

This additional DB query to get current_command and current_step_name could be optimised. Currently it's not being used in reana-ui at all, the only usage of these two fields happens in reana-client while doing: reana-client status -w workflow -v. Note that in this case we are calling get_workflow_status endpoint (this data from get_workflows is not being used)

One way to optimise this would be to introduce an additional flag to fetch this information and use it only in reana-client status --verbose command. This would save us an additional DB query for each of the workflow (especially on UI home page) and some bandwidth while transferring current_command over the network

@giuseppe-steduto giuseppe-steduto self-assigned this Oct 31, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 1, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 1, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 1, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 1, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 1, 2023
Add `--include-command` flag to the `list` and `status` commands that,
when set, will display info about the command currently being executed
by the workflow (or the last command). In case there is no info about
the command, the step name will be displayed, if possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 2, 2023
Add `--include-command` flag to the `list` and `status` commands that,
when set, will display info about the command currently being executed
by the workflow (or the last command). In case there is no info about
the command, the step name will be displayed, if possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 2, 2023
@tiborsimko
Copy link
Member

Several considerations:

(1) Naming-wise, "command" may not be understandable. It would be better to call it "last-command" or "last-run-command", or "last-started-command", or "most-recently-started-command", or some such.

(2) The origin of the need "show which command runs" comes notably from the Serial workflow engine where showing the progress via single command makes the most sense. For CWL/Snakemake/Yadage workflows, there can be many commands that run concurrently in parallel, possibly for hours. Hence the notion of a single "command" to show could be very ambiguous. This underlines the need for naming clarity.

For now, changing the naming to something more self-understandable, such as "last-started-command", should be enough.

For the future though, we may want to display progress of all the "commands" of the workflow differently, via ASCII tables such as:

STEP         STATUS    DURATION   COMMAND
processing1  running        118   python myprocessing.py --process-file 1
processing2  finished        79   python myprocessing.py --process-file 2
processing3  running        111   python myprocessing.py --process-file 3
processing4  pending          0   python myprocessing.py --process-file 4

Anyway, this is something for another issue later. Just illustrating the full picture here of where we might be heading later.

(3) The "command" information can be multi-line, for example:

$ rcg status -watlas-recast-yadage-kubernetes  --verbose
NAME                             RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS     PROGRESS   ID                                     USER                                   COMMAND                                                                                                                                                                                                                                                                                                                                 DURATION
atlas-recast-yadage-kubernetes   31           2023-09-28T06:05:16   2023-09-28T06:05:42   2023-09-28T06:06:48   finished   2/2        3f064760-04e8-403c-b4dd-6379ca016f78   eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa   source /home/atlas/release_setup.sh                                                                                                                                                                                                                                                                                                     66
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                    python /code/make_ws.py /code/data/data.root /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/eventselection/submitDir/hist-sample.root /code/data/background.root                                                                                  
                                                                                                                                                                                                                    mkdir -p /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults                                                                                                                                                                   
                                                                                                                                                                                                                    python /code/plot.py results/meas_combined_meas_model.root /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults/pre.png /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults/post.png
                                                                                                                                                                                                                    python /code/set_limit.py results/meas_combined_meas_model.root \                                                                                                                                                                                                                                       
                                                                                                                                                                                                                           /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults/limit.png /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults/limit_data.json \            
                                                                                                                                                                                                                           /var/reana/users/eeeeeeee-dddd-cccc-bbbb-aaaaaaaaaaaa/workflows/3f064760-04e8-403c-b4dd-6379ca016f78/statanalysis/fitresults/limit_data_nomsignal.json   

This is not good, especially for the list command that lists many rows.

There are several possible solutions to the formatting issue, such as:

  • Keep the output as is, anyway, including new lines and stuff, since the --include-last-command would be fully optional, and not used more, so it won't break anything significant.

  • Make the output single-line, first by joining on newlines, and then possibly even replacing any empty white space by underscores. This will make the formatting to be fully CSV-friendly, at the price of making the last command information harder to read.

  • Defer offering of the new "--include-last-command" option for list, since we shall most probably replace it later by something more suitable for showing parallel commands.

(4) Note also that DURATION is displayed last, after the COMMAND, which makes it hard to see etc. I guess COMMAND should be shown last.

(5) We don't want to break the API much at this 0.9.2 release times, so "command" and "current_command" are probably here to stay in the JSON output as is... However we can perhaps at least name the new reana-client CLI options more reasonably?

giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 17, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 17, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 20, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-client-go that referenced this issue Nov 20, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-client-go that referenced this issue Nov 20, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
@mdonadoni mdonadoni self-assigned this Nov 21, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 22, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 22, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 22, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-workflow-controller that referenced this issue Nov 24, 2023
giuseppe-steduto added a commit to giuseppe-steduto/reana-client that referenced this issue Nov 24, 2023
Add `--include-last-command` flag to the `list` and `status` commands
that, when set, will display info about the command currently being
executed by the workflow (or the last submitted command). In case there
is no info about the command, the step name will be displayed, if
possible.

Closes reanahub/reana-workflow-controller#486.
@mdonadoni
Copy link
Member

Note that the situation has improved after the performance improvements of reanahub/reana-db#213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment