-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assign backfills a run status based on their sub-run statuses #23702
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @jamiedemaria and the rest of your teammates on Graphite |
8edf322
to
a9049d3
Compare
if any(status == "CANCELED" for status in sub_run_statuses): | ||
return GrapheneRunStatus.FAILURE | ||
|
||
# can't import this because two deserializers get registered for PipelineRunStatsSnapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use the code below to check the statuses of the runs, but if I import DagsterRunStatus
I get the error dagster._serdes.errors.SerdesUsageError: Multiple deserializers registered for storage name 'PipelineRunStatsSnapshot'
I think because the storage_namefor
DagsterRunStatsSnapshot`
@whitelist_for_serdes(storage_name="PipelineRunStatsSnapshot")
class DagsterRunStatsSnapshot(
...
collides with GraphenePipelineRunStatsSnapshot
Not sure what there is to de about this other than move DagsterRunStatus
into a separate module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what there is to de about this other than move DagsterRunStatus into a separate module
Surprised that that would help with the serdes error, but moving it to a separate module seems like a nice thing to me.
a9049d3
to
e825e0d
Compare
4c99718
to
b088121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to store the run status in the DB at some point to facilitate filtering
Curious to hear @prha's thoughts, but I do think we should prioritize storing this in the DB. Backfills can have lots of runs, so computing these at read time could make loading the runs page much slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this run status used for? I think the run status coercion is a bit odd, to be honest.
Is it because we didn't want to use a different type in the frontend?
That's true... if we wanted to store this in the DB, we'd have to add an aggregate run status column (or something like that)... we'd probably update it in the event log consumer daemon, as runs complete. |
I mostly don't like calling it |
Pretty much. The reasoning is that if we have a consolidated list of runs and backfills, we want to display consistent status info about each entry. |
I think we should maybe rename to something else... maybe aggregateRunStatus or groupedRunStatus or something like that. Doesn't have to be this diff, but I think we should also consider having a separate column in the DB on the bulk actions table for it, just to minimize data reads on runs page loads. Just like we update the |
Is there a world where we would consider replacing the bulk action status column with these statuses? The existing bulk action statuses for an asset backfill are already largely based on the aggregate status of runs within the backfill. |
I think it depends on what actions we need to take based on the status. There's an external status, which is what we would show to the user. I think that's fine to completely shift to this new aggregate status. But I think there may be some configurable policies in terms of what we internally need to kick off, w.r.t launching runs, retries, etc. It might still be useful to have an internal-only concept of status/state of the backfill. |
I think we do ourselves a disservice by calling everything status.
|
To help myself get a stronger grip on what we're talking about here, is the main issue with the current set of statuses that it doesn't allow us to distinguish between these two different outcomes?
The separation you're talking about makes sense @prha , but I think I'm reacting negatively to the name |
I see that we're effectively querying "status" for 2 different purposes.
I don't really care what we call them, but I want to make sure we're not coalescing two things that should actually stay separate. Separately, I prefer that we don't call #2 the same thing for both runs and backfills, because they represent two separate things. It feels like a liability for a future bug where we think we can make inferences based on this value the actions that we can take on the object. I don't have a strong attachment to the specific naming of it though. |
I did look in to converting to storing We could lean in to the aggregated status being only about communicating externally and call it something like |
Thinking about this a little bit more, one thing I could imagine is wanting to add functionality that automatically retries failed backfills (failed in the sense of submitted all runs but some failed). Yesterday I was chatting with a customer who wanted this. So I think this is more than just a cosmetic status. From the other direction, the backfill daemon has both COMPLETED and FAILED statuses. Both of these mean "backfill daemon doesn't need to do more work", but they're valuable for reporting purposes. Also, the run statuses on runs are there for a mix of operational and reporting purposes. E.g. the difference between FAILURE and CANCELED is mainly for reporting purposes, but QUEUED has operational value. This makes me wonder whether we should separate out the statuses. A third option to consider here would be to add a column called something like "completed_outcome", which is SUCCESS or FAILURE if the bulk action status is COMPLETED. |
Oo yeah backcompat definitely something we need to think through if we don't want to compute the aggregate run status at read time. I think the easiest thing would be to say that backfills that completed prior to this change won't show up when someone filters for "run status=SUCCESS" or "run status=FAILURE" on the runs page. My strong suspicion is that people mostly filter on these statuses to monitor recent runs, so omitting some historical backfills is likely not a big deal. |
if converted_status is BulkActionStatus.REQUESTED: | ||
# if no runs have been launched: | ||
if len(self._get_records(_graphene_info)) == 0: | ||
return GrapheneRunStatus.QUEUED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be not_started? starting? or just map to Started for all in requested state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think STARTING
might make sense here, but if it's STARTED
, it's not a big deal.
closing in favor of #24365 |
Summary & Motivation
Computes the
DagsterRunStatus
for a backfill based on the statuses of the sub-runs and the BulkAction status of the backfill, rather than just mappingBulkActionStatus -> DagsterRunStatus
in a one-to-one fashion.We might want to store the run status in the DB at some point to facilitate filtering, but as a first step I'm just adding it to the GQL layer where it will be faster to iterate on how sub-run statuses inform overall status and won't potentially require migrating old data if we change how we determine status.
How I Tested These Changes
added assertions on run status in existing tests