-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better understanding the dataflow within a pipeline #4119
Comments
Until fairly recently, I was under the impression that a job is a collection of actions that have been run, but I now know that this is a JobRequest. |
@benbc also pointed out that this issue is related to opensafely-core/job-runner#196. |
This could potentially be a fairly big piece of work. Is there something smaller we can do to help the user in question for now? |
I think a candidate solution would be to display a table on the workspace detail page that contained the set of jobs associated with the workspace in one column, and their last successfully completed time in another column. This table should be sortable by either column. Sorting by last successfully completed time would help the user determine whether action From a user's perspective, this solution would involve them remembering an edge in the dependency graph -- the edge between vertices From a technical perspective, this solution would involve joining This solution may require special handling of the This solution would require a sortable table, but a sortable table already exists as a UI component. Ultimately, this solution isn't small (although it isn't large, either!) and it should involve user-testing; it would be sensible to create a mock-up, before working on the implementation, to facilitate this. However, this solution could be a first step to actually visualizing the dependency graph and the dataflow, which may satisfy opensafely-actions/.github#7. |
@lucyb and I discussed this issue on Slack1 and agreed to move it to Later. Although there are several candidate solutions, it's clear we need to know more about the problem to be able to choose a candidate solution with confidence. Indeed, it may be that the separation of actions, jobs, and job requests needs rethinking. Footnotes |
For when you think about this again, #3566 highlights a slightly different (and I think more common) use case for surfacing action-specific logs. |
That's really useful, thanks @LFISHER7. |
In a recent Bristol-Cambridge-Oxford meeting, a researcher said that it was hard to determine when an action was last run, especially when there were many actions in a pipeline. The researcher said this was important because it was hard to determine whether a downstream action, let's call it
B
, needed to be rerun because an upstream action, let's call itA
, had been rerun. In other words, it wasn't clear that the timestamp ofA
was after the timestamp ofB
, and hence that outputs fromB
may not reflect outputs fromA
.I've summarized the issue as "Better understanding the dataflow within a pipeline", but should emphasize that what's hard to determine is when the dataflow is invalid with respect to the dependency graph represented by the pipeline.
This discussion also surfaced a related issue: the researcher asked about the difference between an action and a job. (Conceptually, an action is the class; a job is the instance of the class.)
The text was updated successfully, but these errors were encountered: