You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users want to know how to run backfill jobs. The docs should include an example of how to trigger a launch plan across a range of dates. The launch plan should take datetime as an input so that the range of dates can be applied onto it.
This is different from dynamic and map tasks because backfill jobs likely want to run on their own nodes and want to be evaluated at compile-time.
A similar use case is for workflows that want look-back. When users run daily jobs, it is convenient to quickly evaluate the state of the last couple of days. If the data paths of the previous days' outputs is well known, it is a simple gsutil ls (or the like on other file systems) to check this. If a previous day is missing its output data, then that day should be processed and that workflow will proceed with the heavy compute.
Being able to put the workflow iterator on a cron schedule would make this possible. I haven't been able to prove this case yet, but believe it is technically possible.
Method to iterate over dates on a launch plan and return a workflow:
`
def generate_backfill_workflow(
start_date: datetime, end_date: datetime, base_lp: LaunchPlan
) -> Workflow:
if base_lp.schedule is None:
raise ValueError("Backfill can only be created for scheduled launchplans")
if isinstance(base_lp.schedule, CronSchedule):
pass
else:
raise NotImplementedError("The launchplan schedule needs to be a cron schedule")
if start_date >= end_date:
raise ValueError("Start date should be greater than end date")
print(f"Generating backfill for {start_date} to {end_date}")
wf = Workflow(name=f"backfill-{base_lp.name}")
lp_iter = croniter(
base_lp.schedule.cron_schedule.schedule, start_time=start_date, ret_type=datetime
)
while True:
next_start_date = lp_iter.get_next()
if next_start_date > end_date:
break
print(f"Adding -> {next_start_date}")
wf.add_launch_plan(base_lp, kickoff_time=next_start_date)
`
Are you sure this issue hasn't been raised already?
Yes
Have you read the Code of Conduct?
Yes
The text was updated successfully, but these errors were encountered:
@ariefrahmansyah / @pradithya / @frelyf This is now merged. This should be part of the 1.4.0b releases. Please help try it out and help file any issues that you see in use
Description
Users want to know how to run backfill jobs. The docs should include an example of how to trigger a launch plan across a range of dates. The launch plan should take datetime as an input so that the range of dates can be applied onto it.
This is different from dynamic and map tasks because backfill jobs likely want to run on their own nodes and want to be evaluated at compile-time.
A similar use case is for workflows that want look-back. When users run daily jobs, it is convenient to quickly evaluate the state of the last couple of days. If the data paths of the previous days' outputs is well known, it is a simple gsutil ls (or the like on other file systems) to check this. If a previous day is missing its output data, then that day should be processed and that workflow will proceed with the heavy compute.
Being able to put the workflow iterator on a cron schedule would make this possible. I haven't been able to prove this case yet, but believe it is technically possible.
Method to iterate over dates on a launch plan and return a workflow:
`
`
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: