Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra DagRuns Scheduled After Backfill #1266

Closed
jjfine opened this issue Mar 30, 2016 · 5 comments
Closed

Extra DagRuns Scheduled After Backfill #1266

jjfine opened this issue Mar 30, 2016 · 5 comments
Assignees

Comments

@jjfine
Copy link

jjfine commented Mar 30, 2016

Dear Airflow Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • We see this issue on our production ubuntu machine running 1.6.2 w/ celery executor
  • Reproducing now with Airflow 1.7 installed via pip on my OSX dev box with all default configuration except that examples are disabled.

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

  • When running a backfill for 1 day, I expect only 1 dag run to start.
  • Instead, many additional DagRuns are started. 4 days before start of backfill and up to the current day.
  • Here is how you can reproduce this issue on your machine:

Reproduction Steps

  1. Copy example_bash_operator.py to your dags folder.
  2. Change the - to a + on this line so that the dag isn't scheduled immediately.
  3. Start a scheduler and webserver. The dag isn't triggered, as expected.
  4. Run airflow backfill -s 2016-03-27 -e 2016-03-27 example_bash_operator

Once the backfill finishes, more DagRuns will be scheduled:

screen shot 2016-03-30 at 6 21 51 pm

By the time the dust settles we have DagRuns for the 2016-03-23 through the 2016-03-29 (writing this on the 2016-03-30):

screen shot 2016-03-30 at 6 34 04 pm

Is this expected behavior? Is there any way around this?

@bolkedebruin
Copy link
Contributor

Please try #1271 and see if that fixes your issue

@jlowin
Copy link
Member

jlowin commented Apr 3, 2016

@jjfine are you running a Scheduler?

The scheduler uses the last run, not the start_date, when scheduling dags. So when you backfilled the dag, the scheduler observed the last run time and began scheduling. The clue here is that the Scheduler actually picks a starting date 5 days BEFORE the last run time. I'm not totally sure why -- perhaps @mistercrunch has a specific reason? In #1271, I added a check so that the scheduler won't schedule any runs prior to the start_date (though it still tries to go back 5 days), so that should clear up some of the confusion.

@jjfine
Copy link
Author

jjfine commented Apr 3, 2016

@jlowin yeah, I'm running a scheduler. I'll try out #1271 on Monday. Sounds like it should help.

@jlowin
Copy link
Member

jlowin commented Apr 3, 2016

Sorry I just saw that you said very clearly you're running a scheduler :) Late night!

@jlowin jlowin self-assigned this Apr 4, 2016
@jlowin
Copy link
Member

jlowin commented Apr 6, 2016

@jjfine keep an eye on #1291. Schedulers will still run tasks following the last backfill date,but only if it's after that start date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants