-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax mandatory requirement for start_date when schedule=None #35356
Conversation
tests/models/test_dag.py
Outdated
dag = DAG("dag_without_start_date") | ||
dag.add_task(BaseOperator(task_id="task_without_start_date")) | ||
dagrun = dag.create_dagrun( | ||
state=State.RUNNING, run_type=DagRunType.SCHEDULED, execution_date=DEFAULT_DATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can DagRunType
be scheduled ? With scheduled runs we must have start_date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can DagRunType be scheduled ? With scheduled runs we must have start_date
IMHO it should be ok if catchup is False
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler creates runs with type scheduled. The fix in this PR is for manual runs that are created by the user.
catchup parameter has no affect on manual runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated DagRunType.SCHEDULED
to DagRunType.MANUAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I wonder if we can do the same thing for dags with schedule!=None
if the catchup is set to False
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so because the first run depends on the start_date. If you set it in the past and interval passed when dag is activated it will create a run. If you set it for a future date then interval is not completed thus no run will be created when dag is set to active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but we can consider that this condition is always met when the user does not provide a start date.
Do we have a test for a non-None schedule raising an exception? |
@uranusjr Do we need to fail explicitly for non None Schedule and empty start date? In this case, it will start the job on the next schedule |
Also want to mention that previously start date and task date were checked to throw the start date error which is part of add task. But schedule will not be empty then as it will be the default schedule interval value. |
@uranusjr @eladkal @hussein-awala |
If I remember correctly, we currently fail explicitly for that combination, so it’s better to continue the behaviour. Having a DAG created silently but can never fire can be confusing for users. Task start dates are fine, just consider the DAG date. |
Thanks for the input. Changes are added. |
Tests are failing:
|
Fixed the failing tests |
* Relax mandatory requirement for start_date when schedule=None * Updated run_type in unit tests * Added check for empty start_date and non empty schedule * Fix the build failures * Fix the build failures * Update based on review comments (cherry picked from commit 930f165)
`start_date` is now optional, if there is no schedule for the DAG (apache#35356). Tasks, however, inherit the `start_date` of their DAG by default. So it's now possible for tasks to have no `start_date` as well. Closes: apache#40828
As a part of this PR, existing
start_date
validation is removed to handleschedule=None
Closes: #35199