Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gantt chart flickering and constant rescaling #42215

Closed
1 of 2 tasks
adamgorkaextbi opened this issue Sep 13, 2024 · 18 comments · Fixed by #44488
Closed
1 of 2 tasks

Gantt chart flickering and constant rescaling #42215

adamgorkaextbi opened this issue Sep 13, 2024 · 18 comments · Fixed by #44488
Labels
affected_version:2.10 Issues Reported for 2.10 area:core area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug

Comments

@adamgorkaextbi
Copy link

Apache Airflow version

2.10.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Gantt chart is flickering due constant rescaling
"Queued at" time is computed incorrectly +2h to start and end time of DAG
image
image

What you think should happen instead?

I should see correct Gantt chart or at lease not flickering

How to reproduce

We migrate from airflow 2.7 to 2.9.3(same Gantt issue) and 2.10.1

Operating System

airflow docker release

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Helm chart

Anything else?

We migrate from airflow 2.7 to 2.9.3(same Gantt issue) and 2.10.1

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@adamgorkaextbi adamgorkaextbi added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 13, 2024
Copy link

boring-cyborg bot commented Sep 13, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the area:UI Related to UI/UX. For Frontend Developers. label Sep 13, 2024
@eladkal eladkal added good first issue affected_version:2.10 Issues Reported for 2.10 and removed needs-triage label for new issues that we didn't triage yet labels Sep 15, 2024
@Shlomixg
Copy link

Same here, occurs after upgrading from 2.8.2 to 2.10.1

@adamgorkaextbi
Copy link
Author

still same issue on 2.10.2 version

@adamgorkaextbi
Copy link
Author

nice video in duplicated issue:
#42243

width of

's representing bars is changing in infinite loop
class name is also change in loop, generating break point on attribute modification lead to this code:
image

@dannyl1u
Copy link
Contributor

dannyl1u commented Oct 2, 2024

Hi @adamgorkaextbi, are there any specific steps to reproduce this? I'm looking into this but can't seem to reproduce the issue on my end. Thanks!

@adamgorkaextbi
Copy link
Author

@dannyl1u
How to reproduce
Use airflow official image and helm chart. Database backend is PostgreSQL. Airflow is using UTC time. Server is running at UTC. users work in CEST (+2h versus UTC time). Externally trigger is used to run DAGs.
Migrate from airflow 2.7 (that has its own Gantt chart issues) to 2.9.3 (where flickering appear) and then 2.10.1 and then 2.10.2
what in my opinion lead to situation where
incorrect values of Queued at (old values and new once) are generated (since airflow 2.7 Queued at has incorrect value +2h after start date, Queued at should occurs before started date)
and this may lead Front End to incorrectly and constantly re-compute positions (padding and width) of Tasks on Gantt chart

@adamgorkaextbi
Copy link
Author

I have checked and
queued_dttm alias Queued at in database has correct value stored (datetime with timezone)
I suspect fronted end is processing or receiving queued_dttm that is used and displayed "Queued at" without taking into account time zone for this field this is why on our www we see +2h and why Gantt chart is getting crazy

@adamgorkaextbi
Copy link
Author

task details and Gantt chart both use reacts useGridData() to get data

@adamgorkaextbi
Copy link
Author

read this:

queued_when?: string | null;

then read this:
from marshmallow_sqlalchemy import SQLAlchemySchema, auto_field

then read this:
queued_dttm = auto_field(data_key="queued_when")

then read this:
https://github.com/marshmallow-code/marshmallow-sqlalchemy/blob/bd7d9edaac005c3aa6ac0c5e0b58a6d42619c8b6/src/marshmallow_sqlalchemy/schema.py#L191
then read this:
https://github.com/marshmallow-code/marshmallow-sqlalchemy/blob/bd7d9edaac005c3aa6ac0c5e0b58a6d42619c8b6/src/marshmallow_sqlalchemy/schema.py#L191-L213
then read this:
https://github.com/marshmallow-code/marshmallow/blob/6a0389b6aca79e95c923adb55f497f20e75e0d86/src/marshmallow/fields.py#L90-L92
https://github.com/marshmallow-code/marshmallow/blob/6a0389b6aca79e95c923adb55f497f20e75e0d86/src/marshmallow/fields.py#L710-L729
I suspect the bug could be in marshmallow repository
So I guess solution would be to not rename queued_dttm to queued_when (string) with auto_field(data_key="queued_when")
start_date and end_date are correctly converted to string while queued_dttm is not - only one difference I find out till is usage of data_key

@dannyl1u
Copy link
Contributor

dannyl1u commented Oct 8, 2024

@adamgorkaextbi @Shlomixg
I've tried it on my DAGs (also on 2.10.0) and the Gantt charts are fine on my end. Could you please share your DAG file? If it contains sensitive information, a sanitized or similar version would be fine, as long as it still reproduces the Gantt chart problem

@adamgorkaextbi
Copy link
Author

@dannyl1u
I guess you need to apply airflow migration scripts to reproduce issue.

I check one more time airflow schema after migration
before I checked only task_instance table where
queued_dttm has correct type timestamptz
image
but today I also check DAG_run table schema
image
according to airflow db model queued_at schould be timestamptz
@dannyl1u
Can you double check schema on your side?

I guess we will try manually change this column type in our DB and let you know if this fix issue.
Still I guess one of migration scripts will require fixing

@adamgorkaextbi
Copy link
Author

If you are running airflow in + timezone EUROPE/ASIA (USA timezones are not affected with flickering, but still data has incorrect type db)
SOLUTION:
"""
ALTER TABLE .dag_run ALTER COLUMN queued_at TYPE timestamptz USING queued_at AT TIME ZONE '';
"""
TIME ZONE VALDIATION SELECT QUERY BEFORE ALTER TABLE:
"""
SELECT id, dag_id, execution_date, state, run_id, end_date, start_date, queued_at, queued_at at TIME zone '' as queued_at_new
FROM dag_run where start_date is not null order by start_date desc limit 100;
"""

TODO:
Correct airflow db migration script with setting correct types of this columns during migration
FIX UI react logic to deal with incorrect timestamp order (start_date, queued_at, end_date) or to check timestamp order and report exception instead of flickering or other unexpected behavior
Add unittests for processing incorrect data on frontend in case of Gantt Chart

@CharlieJ15420
Copy link

Also getting this issue on 2.10.3. Happens on seemingly random DAGs with the gantt UI timescales constantly flickering.

@leetdavid
Copy link

To me, this happens when I have multiple retries for tasks.

@sokokoluhumbu
Copy link

this work for me : in paris ALTER TABLE dag_run ALTER COLUMN queued_at TYPE timestamptz USING queued_at AT TIME ZONE 'Europe/Paris';

@hongshaoyang
Copy link

Airflow Version: v2.10.0
Git Version: .release:e001b88f5875cfd7e295891a0bbdbc75a3dccbfb
Deployment: Official Apache Airflow Helm Cart

Screen.Recording.2024-11-28.at.3.29.11.PM.mov

@darkag
Copy link
Contributor

darkag commented Nov 28, 2024

Same problem on 2.10.3, but as @leetdavid said, it occurs only when there is a retried task. We're using MySQL for the database, so it doesn't seem to be just a timezone issue.

For me, it seems more related to the fact that the Gantt graph attempts to show task executions present in task_instance for a run_id, but within the period between the queued_at and end_date of the dag_run. (If I manually change queued_at to encompass all task_instance, the flickering stops.)

The minimum date for the Gantt graph shouldn't be queued_at but rather the minimum start_date of task_instance

darkag added a commit to darkag/airflow that referenced this issue Nov 29, 2024
@darkag
Copy link
Contributor

darkag commented Nov 29, 2024

The closed pull request above should solve the problem but since I don't understand what to do the validation error message...

basically this part of the airflow/www/static/js/dag/details/gantt/index.tsx file

// Reset state when the dagrun changes
  useEffect(() => {
    if (startDate !== dagRun?.queuedAt && startDate !== dagRun?.startDate) {
      setStartDate(dagRun?.queuedAt || dagRun?.startDate);
    }
    if (!endDate || endDate !== dagRun?.endDate) {
      // @ts-ignore
      setEndDate(dagRun?.endDate ?? moment().add(1, "s").toString());
    }
  }, [
    dagRun?.queuedAt,
    dagRun?.startDate,
    dagRun?.endDate,
    startDate,
    endDate,
  ]);

is triggered when startDate/endDate and set dagRun?.queuedAt || dagRun?.startDate as start date but doing so it seems to trigger a redraw of the graph which call setGanttDuration for each task instance that could lead to a change of the startDate/endDate entering in an infinite loop of date changes.

my pull request tried to solve the issue by removing the startDate/endDate in the "reset" declaration, it works if I rebuild javascript on my local airflow instance, but doesn't pass the github validation process. It seems that just removing startDate/endDate make the file inconsistent and since my knowledge in reactjs is almost 0, I will let someone with a better understanding fix this error

darkag added a commit to darkag/airflow that referenced this issue Nov 29, 2024
jscheffl pushed a commit that referenced this issue Nov 30, 2024
github-actions bot pushed a commit that referenced this issue Nov 30, 2024
(cherry picked from commit 0c354e7)

Co-authored-by: darkag <[email protected]>
jscheffl pushed a commit that referenced this issue Nov 30, 2024
(cherry picked from commit 0c354e7)

Co-authored-by: darkag <[email protected]>
utkarsharma2 pushed a commit that referenced this issue Dec 4, 2024
(cherry picked from commit 0c354e7)

Co-authored-by: darkag <[email protected]>
utkarsharma2 pushed a commit that referenced this issue Dec 9, 2024
(cherry picked from commit 0c354e7)

Co-authored-by: darkag <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.10 Issues Reported for 2.10 area:core area:UI Related to UI/UX. For Frontend Developers. good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants