Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Backfill and look-back #3212

Closed
2 tasks done
frelyf opened this issue Jan 6, 2023 · 2 comments · Fixed by flyteorg/flytekit#1420
Closed
2 tasks done

[Core] Backfill and look-back #3212

frelyf opened this issue Jan 6, 2023 · 2 comments · Fixed by flyteorg/flytekit#1420
Assignees
Labels
documentation Improvements or additions to documentation flytekit FlyteKit Python related issue
Milestone

Comments

@frelyf
Copy link

frelyf commented Jan 6, 2023

Description

Users want to know how to run backfill jobs. The docs should include an example of how to trigger a launch plan across a range of dates. The launch plan should take datetime as an input so that the range of dates can be applied onto it.
This is different from dynamic and map tasks because backfill jobs likely want to run on their own nodes and want to be evaluated at compile-time.

A similar use case is for workflows that want look-back. When users run daily jobs, it is convenient to quickly evaluate the state of the last couple of days. If the data paths of the previous days' outputs is well known, it is a simple gsutil ls (or the like on other file systems) to check this. If a previous day is missing its output data, then that day should be processed and that workflow will proceed with the heavy compute.
Being able to put the workflow iterator on a cron schedule would make this possible. I haven't been able to prove this case yet, but believe it is technically possible.

Method to iterate over dates on a launch plan and return a workflow:

`

def generate_backfill_workflow(
    start_date: datetime, end_date: datetime, base_lp: LaunchPlan
) -> Workflow:

    if base_lp.schedule is None:
        raise ValueError("Backfill can only be created for scheduled launchplans")

    if isinstance(base_lp.schedule, CronSchedule):
        pass
    else:
        raise NotImplementedError("The launchplan schedule needs to be a cron schedule")

    if start_date >= end_date:
        raise ValueError("Start date should be greater than end date")

    print(f"Generating backfill for {start_date} to {end_date}")
    wf = Workflow(name=f"backfill-{base_lp.name}")
    lp_iter = croniter(
        base_lp.schedule.cron_schedule.schedule, start_time=start_date, ret_type=datetime
    )
    while True:
        next_start_date = lp_iter.get_next()
        if next_start_date > end_date:
            break
        print(f"Adding -> {next_start_date}")
        wf.add_launch_plan(base_lp, kickoff_time=next_start_date)

`

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@frelyf frelyf added documentation Improvements or additions to documentation untriaged This issues has not yet been looked at by the Maintainers labels Jan 6, 2023
@welcome
Copy link

welcome bot commented Jan 6, 2023

Thank you for opening your first issue here! 🛠

@cosmicBboy cosmicBboy changed the title [Docs] Backfill and look-back [Core] Backfill and look-back Jan 25, 2023
@kumare3 kumare3 self-assigned this Jan 26, 2023
@kumare3 kumare3 added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Jan 26, 2023
@cosmicBboy cosmicBboy added this to the 1.4.0 milestone Jan 29, 2023
@kumare3
Copy link
Contributor

kumare3 commented Feb 3, 2023

@ariefrahmansyah / @pradithya / @frelyf This is now merged. This should be part of the 1.4.0b releases. Please help try it out and help file any issues that you see in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation flytekit FlyteKit Python related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants