-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks taking too long time after 2.7.0 Airflow update #33688
Comments
From the discussion: HI @jaetma ,
Answers:
Today we did a downgrade to 2.6.3 again, the issue seems to be resolved! |
I converted that discussion to an issue, as I think it is something we should at least try to diagnose before 2.7.1 - and maybe @jaetma you could help with it and (assuming we will find a root cause) you could test it with release candidate of 2.7.1 that should go out soon). @jaetma - woudl it be possible to get some logs from the time when it was slow - namely logs from workers and task, ideally maybe getting logs from the same task from before and after change to 2.7.0 - showing what's going on and maybe comparing what was happening then? Seeing the same logs from 2.7.0 and 2.6.3 could help us to make some hypothesis what was wrong. Ideally maybe two gists showing logs from similar run of the same task? |
Also @jaetma, is it that the tasks remain queued for a long time before they begin execution? Can you check if the scheduler logs have something saying as well? |
Thanks for your response, I can test in our dev environ! After the downgrade the task run time is normal: The next line are from Ariflow tasks logs 2.6.3 and 2.7.0:
|
Hi @pankajkoti! Sadly, I cannot obtain the scheduler logs from the time it was slow :/ This resulted in a DAG run time of 10 minutes! |
OK. That gives us some clue. It seems that there are huge delays (20-30 seconds) between some steps that in 2.6.3 have been almost immediate: 2.7.0
Compare it with 2.6.3
So at least we might come up with some hypotheses. I will take a closer look but maybe others migh have some ideas. |
@jaetma could you share the DAG you are using? That way I can try to replicate it locally. As @pankajkoti mentioned in our deployments I cannot see any slowness. |
Yes. I do not suspect it's a "widespread" problem - IMHO this is something environmental on your side that triggers it - some dependency, that introduces a lot of overhead, specific database connectivity you have, some resource limiting you have in your environment @jaetma. Few questions: - can you share a very detailed description of your environment - including versions of OS you run it on, whether the machine are run in some cloud or bare metal whether you have some virtualisation techniques, how your docker compose si configured, do you have any special resources limits applied (memory, sockets, I/O/CPU), what is the database you are using, whether the database has some limits etc. do you use PGBouncer if you use Postgres. My question is also - is airflow version the ONLY thing that changes when you migrate? Do you also migrate other things together with airflow (docker/docker/compose/different DB instance/different account with different resource limits etc.) . I want to rule out that Airflow is the culprit - maybe together with Airflow migration. You also migrated other things and they are the guilty ones? |
I looked shortly and: The logs below are coming from the same forked process. Between this:
and this:
the following things happen:
From those actions, I could likely exclude firs 4 (unless your filesystem is broken) and by the deduction method if we exclude the impossible, what remains must be the reason. So it's either 5) or 6). My best guess @jaetma is that either: Hypothesis 1Your on_start_listener is hanging on something -> not very likely that you already have some listener but since you were on 2.6, it's possible. Hypothesis 2Parsing your DAGs inside the Celery worker is hanging on something. The most likely reason is that you have some TOP level code that (for example) does a networking call that hangs. My intelligent guess and the 30 seconds suggest that your DNS is misconfigured/broken or networking prevents it from responding quickly. Hanging on DNS call is quite plausible hypothesis. From what I remember 30 seconds is often default DNS resolution timeout. So my best guest is that somewhere during your migration your environment's networking gets broken in your Docker Compose and the DNS you have is not working properly, thus making whatever you do on Top level of your DAG (which BTW you should not if you do) slow to run. Obligatory Haiku: Hypothesis 2aAnother variant of the Hypothesis 2) if you are using Airflow Variables at the top of your DAG code or any other database access (which BTW you should not if you do), this might lead to a database connection. And opening new DB connection might be problematic from your server point of view if there are already many connections opened. You will see in your database server by high number of opened connections. And Postgres does not cope well with the number of connection airflow opens so if you use Postgres, and do not have Pgbouncer between Airflow and Postgres - this might be the reason. I would love if you could check it because I have a reason to believe we could have many more connections opened in Airflow 2.7 (I have just a suspicion about it). So if this could be checked - you should see a much larger number of connections to your DB if my guess is right when you run 2.7). If you could check that hypothesis, that would be great. |
I have encountered same problem on our production environment that uses KubernetesExecutor. We have 150+ DAGs and these DAGs generated by single generator DAG file. Interesting part is that worker pod memory usages reached to 3x memory of previous versions. Previously we gave 500 MB memory to worker pod. Now, we had to gave 1.5 GB and in general worker pod doesn't do anything heavily so i am shocked when I see memory usages :( Also, Airflow Scheduler dies frequently I didn't understood really well. I monitored database, scheduler pod memory usages etc. Everything is normal but scheduler dies frequently and DAGBag import time increased. So, I downgraded to previous version 2.6.3 and problem is solved for now 👍 P.S We are using 3 replicas PgBouncer in front of our AWS RDS instance. |
Hello @sunank200 This is the DAG, its just a simple query insertion:
|
This is my docker-compose file running in: Red Hat Enterprise Linux Server, cloud
|
Hi @potiuk! Our hypothesis is that this issue might be related to changes in Python's pickle library. Airflow 2.6.3 was primarily used with Python 3.7 in our environment. For the downgrade to work correctly, we had to update our image to use Python 3.8 due to the changes in the pickle library between these Python versions. We suspect that these changes might be affecting Airflow's internal handling of task serialization and deserialization, thus causing the performance degradation. |
@jaetma and @mpolatcan So can you reiterate and explain again what were the differences between the environments you have? It's clear that you seem to hit the problem when you upgrade Airflow to 2.7.0 but it seems this is not the only thing you are upgrading. @mpolatcan you mentioned that you also upgreaded Python version and you mentioned piclkle library but this is a bit vague. Can you plese help us to narrow it down and maybe even do some experimenting to help us?
Also - are you using pickling? https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#donot-pickle - airflow's "donot-pickle" is set to "True" by default. Similarly https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#enable-xcom-pickling is set to "False"- XCom pickling is disabled by default. You mentioned pickling library, but (except Python Virtualenv/External Python operator) we are not really using pickling for anything in Airflow. We do use serialization, but without Pickling so I'd be surprised to see pickling library has an effect. But maybe you have those parameter's changed? Can you also (if you get to the point that you have an installation that exhibits the problem) try to change those values (if your current configuration uses pickling). I am trying to narrow down the problem - we know already that in some cases, Airflow 2.7.0 upgrade might trigger some problems. But we also know that it is not in all environments, only some. So we need to narrow down and track what is really causing it - is it just Airflow, or Python version or some dependencies. I'd relaly love if we can do this remote experiments with you to see if we can track the root cause of the problem. And Many thanks for the reports so far, this is already helpful to see that there is something we need to track down. |
@jaetma Just wondering, are you use this ENV VAR in production?
I'm not sure that is this a problem here however the base recommendation do not use
|
This is exceptionally good point @Taragolis . NEVER EVER UNDER ANY CIRCUMSTANCES you should use this option in anything close to production. Please build your custom image instead and see if you continue having the problem. |
Enabling DEBUG logging might help verify this. |
... test, development. IMHO, this variable should only use if life or carrier depends on it 😥 I hope one day we will remove this env variable completely (Airflow 3?). According to my past experience with companies who already use Airflow chance that also And I can't see any better solution rather than disable (neutral) or terminate container (chaotic evil 👿 ) if found this variable, on the Internet a lot of suggestion to use it 😞 😢
I've bet that Airflow 2.7 use |
Hypotestis 3. So it might be that |
(BTW. It's not A LOT @Taragolis - just 4 pages :D) |
@potiuk and I've also got 131 results from SO: https://www.google.com/search?q=_PIP_ADDITIONAL_REQUIREMENTS+site:stackoverflow.com 🤣 I've just look couple of them hopefully not all of them suggest to use it, sometimes even opposite: "Please do not use it" This one funny: https://stackoverflow.com/questions/76424725/airflow-docker-pip-additional-requirements-argument-why-there-is-a-comment-to In additional google do not return results for some local-specific resources, but I definitely know some resources where suggested this solution |
Hi again @potiuk, I didn't mention anything about pickle library @jaetma mentioned it from airflow.models import DAG
from datetime import datetime, timedelta
from utils import AirflowCallbacks
from operators import GetirKubernetesCronJobOperator
default_args = {
"owner": "Getir",
"catchup": False,
"depends_on_past": False,
"retries": 1,
"retry_delay": timedelta(seconds=15),
"start_date": datetime(year=2019, month=7, day=10)
}
dag = DAG(
dag_id="client_list",
default_args=default_args,
schedule_interval=None,
tags=["client-list", "reports"]
)
GetirKubernetesCronJobOperator(
name="cl-",
task_id="clientlist",
namespace="airflow",
service_account_name="airflow-sa-role-sa",
image="164762854291.dkr.ecr.eu-west-1.amazonaws.com/getir-python:3.7-buster",
image_pull_policy="IfNotPresent",
repo_name="client-list-cron",
exec_env="eks",
eks_nodegroup="ng-airflow-reports",
eks_cpu_limit="500m",
eks_memory_limit="16000Mi",
cmds=[
"bash", "-c",
(
"mkdir -p /usr/src/app && cp -r /repo/* /usr/src/app && "
"cd /usr/src/app && pip install -r requirements.txt -q && "
"python -u main.py {{ dag_run.conf.get('parameters') }}"
)
],
labels={"report": "yes"},
annotations={
"report_type": "panel",
"report_name": "client-list",
"exec_env": "{{dag_run.conf.get('exec_env', 'fargate')}}",
},
startup_timeout_seconds=600,
is_delete_operator_pod=False,
reattach_on_restart=False,
pool="client_list_pool",
on_failure_callback=AirflowCallbacks.client_list_failed_callback,
dag=dag
) |
@mpolatcan Just wondering is any chance that you use some DB performance analyzer in top of your RDS instance? PGBager, Amazon RDS Performance Insights and etc. Maybe it could show some anomaly activity? |
@raphaelsimeon Any chance that you have enabled Performance Insights in your prod/preprod RDS instances? |
Hello @potiuk , here is the code of the DAG including the logs before and after the update to Airflow 2.7.0. Currently the debug logging is still deactivated, but I will have a look at that in a moment. Is there a good way to enable debug logging only for this one specific DAG? DAG Config import itertools
import os
from datetime import datetime, timedelta
from pathlib import Path
from typing import Iterator
import etl_env as env
from airflow import DAG
from airflow.operators.latest_only import LatestOnlyOperator
from airflow_extensions.macros.teams_plugin import on_failure_message_to_teams
from airflow_extensions.operators.cli_task_builder.cli_task_builder import (
CLIOperator,
CliTaskBuilder,
ConnectionBuilder,
)
from bda_prj_self_service_cas_upload.table_config import config as self_service
from etl_env import PEX_BASE_DIR
default_args = {
"owner": "CK",
"depends_on_past": False,
"start_date": datetime(2021, 6, 1, 7, 0, 0),
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
PEX_PATH = os.path.join(PEX_BASE_DIR, "viyacli.pex")
CASLIB_COUNTRIES = [country.upper() for country in env.COUNTRIES_XX]
def get_self_service_cas_upload_configs() -> Iterator[Path]:
table_config_path = os.path.dirname(self_service.__file__)
return Path(table_config_path).glob(
f"{self_service.SELF_SERVICE_CONFIG_PREFIX}*.yaml"
)
def get_caslib_teams() -> list:
caslib_teams = ["CK"]
for config in get_self_service_cas_upload_configs():
team_name = self_service.get_team_name_from_config_filename(config.name)
caslib_teams.append(team_name.upper())
return list(set(caslib_teams))
def get_all_caslibs() -> Iterator:
yield "PUBLIC"
caslib_teams = get_caslib_teams()
for country, team in itertools.product(CASLIB_COUNTRIES, caslib_teams):
yield f"{country}_{team}"
def create_cas_table_monitoring_task(dag: DAG, caslib: str) -> CLIOperator:
return (
CliTaskBuilder()
.with_task_args(
task_id=f"collect_CAS_table_metrics_{caslib.lower()}",
dag=dag,
)
.with_command(f"{PEX_PATH} cas-monitoring")
.with_connection(
ConnectionBuilder("sas_viya", env_prefix="VIYA")
.with_login("USERNAME")
.with_password("PASSWORD")
.with_host("HOST_URL")
)
.with_connection(
ConnectionBuilder("mssql_db", env_prefix="MSSQL")
.with_login("USERNAME")
.with_password("PASSWORD")
.with_host("HOST_URL")
.with_port("HOST_PORT")
.with_schema("DATABASE")
)
.with_cli_arg(f"--ssl_cert_file {SSL_CERT_PATH}")
.with_cli_arg(f"--viya_caslib {CASLIB}")
.with_cli_arg(f"--viya_client_secret {CLIENT_SECRET}")
.with_cli_arg("--monitoring_table sas_cas_table_status")
.create()
)
def create_dag():
return DAG(
"prj_ck_cas_table_monitoring",
description=__doc__,
default_args=default_args,
schedule="*/20 * * * *",
max_active_tasks=3,
max_active_runs=1,
on_failure_callback=on_failure_message_to_teams,
)
dag = create_dag()
latest_only = LatestOnlyOperator(task_id="latest_only", dag=dag)
for caslib in get_all_caslibs():
cas_table_metrics_tasks = create_cas_table_monitoring_task(dag, caslib)
latest_only >> cas_table_metrics_tasks and here are the logs for one collect_CAS_table_metrics_* task: *** Found local files:
*** * /var/log/airflow/dags/dag_id=prj_ck_cas_table_monitoring/run_id=scheduled__2023-08-28T09:20:00+00:00/task_id=collect_CAS_table_metrics_public/attempt=1.log
[2023-08-28, 11:40:02 CEST] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:20:00+00:00 [queued]>
[2023-08-28, 11:40:02 CEST] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:20:00+00:00 [queued]>
[2023-08-28, 11:40:02 CEST] {taskinstance.py:1308} INFO - Starting attempt 1 of 3
[2023-08-28, 11:40:02 CEST] {taskinstance.py:1327} INFO - Executing <Task(CLIOperator): collect_CAS_table_metrics_public> on 2023-08-28 09:20:00+00:00
[2023-08-28, 11:40:02 CEST] {standard_task_runner.py:57} INFO - Started process 3717615 to run task
[2023-08-28, 11:40:02 CEST] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'prj_ck_cas_table_monitoring', 'collect_CAS_table_metrics_public', 'scheduled__2023-08-28T09:20:00+00:00', '--job-id', '8827796', '--raw', '--subdir', 'DAGS_FOLDER/prj_ck_cas_table_monitoring/prj_ck_cas_table_monitoring.py', '--cfg-path', '/tmp/tmp73qdvuk2']
[2023-08-28, 11:40:02 CEST] {standard_task_runner.py:85} INFO - Job 8827796: Subtask collect_CAS_table_metrics_public
[2023-08-28, 11:40:02 CEST] {task_command.py:410} INFO - Running <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:20:00+00:00 [running]> on host OBSCURED_HOSTNAME
[2023-08-28, 11:40:02 CEST] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='CK' AIRFLOW_CTX_DAG_ID='prj_ck_cas_table_monitoring' AIRFLOW_CTX_TASK_ID='collect_CAS_table_metrics_public' AIRFLOW_CTX_EXECUTION_DATE='2023-08-28T09:20:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-08-28T09:20:00+00:00'
[2023-08-28, 11:40:02 CEST] {logging_mixin.py:150} INFO - Execute cli task
[2023-08-28, 11:40:02 CEST] {base.py:73} INFO - Using connection ID 'sas_viya_conn' for task execution.
[2023-08-28, 11:40:02 CEST] {base.py:73} INFO - Using connection ID 'cxa_mssql_db_domain_user_format' for task execution.
[2023-08-28, 11:40:02 CEST] {logging_mixin.py:150} INFO - Environment variables: ['LANG', 'PATH', 'VIYA_USERNAME', 'VIYA_PASSWORD', 'VIYA_HOST_URL', 'MSSQL_USERNAME', 'MSSQL_PASSWORD', 'MSSQL_HOST_URL', 'MSSQL_HOST_PORT', 'MSSQL_DATABASE']
[2023-08-28, 11:40:02 CEST] {logging_mixin.py:150} INFO - Bash command: /srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_VIYA_CLIENT_SECRET --monitoring_table sas_cas_table_status
[2023-08-28, 11:40:02 CEST] {subprocess.py:63} INFO - Tmp dir root location: /tmp
[2023-08-28, 11:40:02 CEST] {subprocess.py:75} INFO - Running command: ['/usr/bin/bash', '-c', '/srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_VIYA_CLIENT_SECRET --monitoring_table sas_cas_table_status']
[2023-08-28, 11:40:02 CEST] {subprocess.py:86} INFO - Output:
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:39 - Item missing key created - skipping for monitoring:
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:40 - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "version": 3,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "name": "OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "tableReference": {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "version": 2,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "tableUri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sessionId": "OBSCURED_SESSION_ID",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceTableName": "OBSCURED_USER_MAPPINGTEST.sashdat",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceCaslibName": "PUBLIC"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "state": "unloaded",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "repeated": false,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceLastModified": "2023-06-09T09:40:45.761Z",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "serverName": "cas-shared-default",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "caslibName": "PUBLIC",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "attributes": {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "owner": "cas",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "size": 8448,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "encryption": "NONE",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "time": "2023-06-09T10:40:46+01:00",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "group": "sas"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "links": [
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "up",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "self",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.cas.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "DELETE",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "delete",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "PUT",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "updateState",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/state",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/state",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "responseType": "application/json;text/plain"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.column"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "dataTable",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.data.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - ]
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:39 - Item missing key created - skipping for monitoring:
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:40 - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "version": 3,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "name": "MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "tableReference": {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "version": 2,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "tableUri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sessionId": "OBSCURED_SESSION_ID",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceTableName": "MAP_DEVICE_TYPE_FIXED.sashdat",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceCaslibName": "PUBLIC"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "state": "unloaded",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "repeated": false,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "sourceLastModified": "2023-06-09T14:18:52.223Z",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "serverName": "cas-shared-default",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "caslibName": "PUBLIC",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "attributes": {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "owner": "cas",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "size": 8544,
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "encryption": "NONE",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "time": "2023-06-09T15:18:52+01:00",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "group": "sas"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "links": [
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "up",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "self",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.cas.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "DELETE",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "delete",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "PUT",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "updateState",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/state",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/state",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "responseType": "application/json;text/plain"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/columns",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.column"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "rel": "dataTable",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "href": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "uri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.data.table"
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - ]
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:45 - The following tables have been skipped: ['OBSCURED_USER_MAPPINGTEST', 'MAP_DEVICE_TYPE_FIXED']
[2023-08-28, 11:40:06 CEST] {subprocess.py:93} INFO - 2023-08-28 11:40:06.260 | WARNING | viya.mssql_utils:write_to_mssql:35 - No data was passed to insert into the Database.
[2023-08-28, 11:40:06 CEST] {subprocess.py:97} INFO - Command exited with return code 0
[2023-08-28, 11:40:06 CEST] {taskinstance.py:1345} INFO - Marking task as SUCCESS. dag_id=prj_ck_cas_table_monitoring, task_id=collect_CAS_table_metrics_public, execution_date=20230828T092000, start_date=20230828T094002, end_date=20230828T094006
[2023-08-28, 11:40:06 CEST] {local_task_job_runner.py:225} INFO - Task exited with return code 0
[2023-08-28, 11:40:06 CEST] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check Airflow 2.7.0: OBSCURED_VIYA_CLIENT_SECRET Found local files:
OBSCURED_VIYA_CLIENT_SECRET * /var/log/airflow/dags/dag_id=prj_ck_cas_table_monitoring/run_id=scheduled__2023-08-28T09:40:00+00:00/task_id=collect_CAS_table_metrics_public/attempt=1.log
[2023-08-28, 12:16:16 CEST] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:40:00+00:00 [queued]>
[2023-08-28, 12:16:16 CEST] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:40:00+00:00 [queued]>
[2023-08-28, 12:16:16 CEST] {taskinstance.py:1361} INFO - Starting attempt 1 of 3
[2023-08-28, 12:16:16 CEST] {taskinstance.py:1382} INFO - Executing <Task(CLIOperator): collect_CAS_table_metrics_public> on 2023-08-28 09:40:00+00:00
[2023-08-28, 12:16:16 CEST] {standard_task_runner.py:57} INFO - Started process 3763639 to run task
[2023-08-28, 12:16:16 CEST] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'prj_ck_cas_table_monitoring', 'collect_CAS_table_metrics_public', 'scheduled__2023-08-28T09:40:00+00:00', '--job-id', '8827969', '--raw', '--subdir', 'DAGS_FOLDER/prj_ck_cas_table_monitoring/prj_ck_cas_table_monitoring.py', '--cfg-path', '/tmp/tmpvgdhcef6']
[2023-08-28, 12:16:16 CEST] {standard_task_runner.py:85} INFO - Job 8827969: Subtask collect_CAS_table_metrics_public
[2023-08-28, 12:16:24 CEST] {task_command.py:415} INFO - Running <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public scheduled__2023-08-28T09:40:00+00:00 [running]> on host OBSCURED_HOSTNAME
[2023-08-28, 12:16:32 CEST] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='CK' AIRFLOW_CTX_DAG_ID='prj_ck_cas_table_monitoring' AIRFLOW_CTX_TASK_ID='collect_CAS_table_metrics_public' AIRFLOW_CTX_EXECUTION_DATE='2023-08-28T09:40:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-08-28T09:40:00+00:00'
[2023-08-28, 12:16:32 CEST] {logging_mixin.py:151} INFO - Execute cli task
[2023-08-28, 12:16:32 CEST] {base.py:73} INFO - Using connection ID 'sas_viya_conn' for task execution.
[2023-08-28, 12:16:32 CEST] {base.py:73} INFO - Using connection ID 'cxa_mssql_db_domain_user_format' for task execution.
[2023-08-28, 12:16:32 CEST] {logging_mixin.py:151} INFO - Environment variables: ['LANG', 'PATH', 'VIYA_USERNAME', 'VIYA_PASSWORD', 'VIYA_HOST_URL', 'MSSQL_USERNAME', 'MSSQL_PASSWORD', 'MSSQL_HOST_URL', 'MSSQL_HOST_PORT', 'MSSQL_DATABASE']
[2023-08-28, 12:16:32 CEST] {logging_mixin.py:151} INFO - Bash command: /srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_VIYA_CLIENT_SECRET --monitoring_table sas_cas_table_status
[2023-08-28, 12:16:32 CEST] {subprocess.py:63} INFO - Tmp dir root location: /tmp
[2023-08-28, 12:16:32 CEST] {subprocess.py:75} INFO - Running command: ['/usr/bin/bash', '-c', '/srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_VIYA_CLIENT_SECRET --monitoring_table sas_cas_table_status']
[2023-08-28, 12:16:32 CEST] {subprocess.py:86} INFO - Output:
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.935 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:39 - Item missing key created - skipping for monitoring:
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.936 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:40 - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "version": 3,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "name": "OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "tableReference": {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "version": 2,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "tableUri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sessionId": "OBSCURED_SESSION_ID",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceTableName": "OBSCURED_USER_MAPPINGTEST.sashdat",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceCaslibName": "PUBLIC"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "state": "unloaded",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "repeated": false,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceLastModified": "2023-06-09T09:40:45.761Z",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "serverName": "cas-shared-default",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "caslibName": "PUBLIC",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "attributes": {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "owner": "cas",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "size": 8448,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "encryption": "NONE",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "time": "2023-06-09T10:40:46+01:00",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "group": "sas"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "links": [
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "up",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "self",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.cas.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "DELETE",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "delete",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "PUT",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "updateState",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/state",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/state",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "responseType": "application/json;text/plain"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/OBSCURED_USER_MAPPINGTEST/columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.column"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "dataTable",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/OBSCURED_USER_MAPPINGTEST",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.data.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - ]
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.936 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:39 - Item missing key created - skipping for monitoring:
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.936 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:40 - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "version": 3,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "name": "MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "tableReference": {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "version": 2,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "tableUri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sessionId": "OBSCURED_SESSION_ID",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceTableName": "MAP_DEVICE_TYPE_FIXED.sashdat",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceCaslibName": "PUBLIC"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "state": "unloaded",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "repeated": false,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "sourceLastModified": "2023-06-09T14:18:52.223Z",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "serverName": "cas-shared-default",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "caslibName": "PUBLIC",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "attributes": {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "owner": "cas",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "size": 8544,
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "encryption": "NONE",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "time": "2023-06-09T15:18:52+01:00",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "group": "sas"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "links": [
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "up",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "self",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.cas.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "DELETE",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "delete",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "PUT",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "updateState",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/state",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/state",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "responseType": "application/json;text/plain"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/casManagement/servers/cas-shared-default/caslibs/PUBLIC/tables/MAP_DEVICE_TYPE_FIXED/columns",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.collection",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "itemType": "application/vnd.sas.cas.column"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - },
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - {
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "method": "GET",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "rel": "dataTable",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "href": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "uri": "/dataTables/dataSources/cas~fs~cas-shared-default~fs~PUBLIC/tables/MAP_DEVICE_TYPE_FIXED",
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - "type": "application/vnd.sas.data.table"
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - ]
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - }
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.936 | WARNING | viya.cas_monitoring.cas_monitoring:_transform_data:45 - The following tables have been skipped: ['OBSCURED_USER_MAPPINGTEST', 'MAP_DEVICE_TYPE_FIXED']
[2023-08-28, 12:16:35 CEST] {subprocess.py:93} INFO - 2023-08-28 12:16:35.936 | WARNING | viya.mssql_utils:write_to_mssql:35 - No data was passed to insert into the Database.
[2023-08-28, 12:16:36 CEST] {subprocess.py:97} INFO - Command exited with return code 0
[2023-08-28, 12:16:36 CEST] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=prj_ck_cas_table_monitoring, task_id=collect_CAS_table_metrics_public, execution_date=20230828T094000, start_date=20230828T101616, end_date=20230828T101636
[2023-08-28, 12:16:36 CEST] {local_task_job_runner.py:228} INFO - Task exited with return code 0
[2023-08-28, 12:16:36 CEST] {taskinstance.py:2784} INFO - 0 downstream tasks scheduled from follow-on schedule check This is one of the faster running tasks in the DAG. But as you can clearly see, the running times have quadrupled from ~4 seconds to ~20 seconds. |
@Taragolis yes, here is the performance activity for the last 5 hours. The peak at 3 AM we see represents also a peak in activity in terms of DAGs running. Anything particular you want to check ? |
@Github-dm-CDE, we would like to also see the scheduler logs if that's possible so enabling logging for only a DAG won't be a solution and no, there's no straightforward way of enabling debug logging only for a specific DAG |
According to Average Active Sessions (AAS) more time consuming operations is locks ( In additional I locally run simple dag on Airflow 2.6.3 and Airflow 2.7.0 Important I run my "performance comparison" on local machine so latency to DB with almost zero, executor was Local and DAG/Task pretty simple and nothing other running at that moment. So result might be far-far-far away of actual problem from airflow.decorators import task
from airflow import DAG
import pendulum
with DAG(
dag_id=f"performance-check",
start_date=pendulum.datetime(2021, 1, 1, tz='UTC'),
end_date=None,
schedule="@daily",
catchup=True,
tags=["performance"],
max_active_runs=64,
) as dag:
@task
def sample_task(test_data=None, ti=None):
print(f"{ti.dag_id}-{ti.run_id}-{ti.task_id}[{ti.map_index}]: {test_data}")
sample_task() Also I've turn on most of the logging on postgres, clean postgres log file before turn on DAG and after all 970 dag runs completed use PGbadger on postgres log: pgbadger-report-airflow-2-results.zip The main differences was in obtain information about previous dag run Airflow 2.7.0 First with total cumulative execution time 24s820ms for 9700 queries Airflow 2.6.3, Second (lets ignore COMMIT) with total cumulative execution time 6s525ms for 9700 queries This behaviour fixed and should be part of 2.7.1, in some circumstances it could be a reason of performance degradation, if DB backed far away of Airflow (latency high), quite a few previous DAG runs exists. In additional it also could be a reason why RAM/CPU usage increased. However in deployment which far away of prod usage this was impact just for about additional 100ms-1s |
@Taragolis indeed, it does not look like the Db is the problem here. here is top SQL in terms of latency in last 24hrs and performance metrics (there is one peak of latency at 8:45UTC, but not during other "problematic hours") |
Hi @potiuk & @ephraimbuddy, OBSCURED_CLIENT_SECRET Found local files:
OBSCURED_CLIENT_SECRET * /var/log/airflow/dags/dag_id=prj_ck_cas_table_monitoring/run_id=manual__2023-08-30T14:22:51.684136+00:00/task_id=collect_CAS_table_metrics_public/attempt=1.log
[2023-08-30, 14:23:49 UTC] {taskinstance.py:1094} DEBUG - previous_execution_date was called
[2023-08-30, 14:23:54 UTC] {__init__.py:51} DEBUG - Loading core task runner: StandardTaskRunner
[2023-08-30, 14:23:58 UTC] {taskinstance.py:1094} DEBUG - previous_execution_date was called
[2023-08-30, 14:24:02 UTC] {base_task_runner.py:68} DEBUG - Planning to run as the user
[2023-08-30, 14:24:02 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> from DB
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Task Instance State' PASSED: True, Task state queued was valid.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Task Instance Not Running' PASSED: True, Task is not in running state.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]>
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Task Concurrency' PASSED: True, Task concurrency is not set.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Pool Slots Available' PASSED: True, There are enough open slots in default_pool to execute the task
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1168} DEBUG - <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [queued]>
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1361} INFO - Starting attempt 1 of 3
[2023-08-30, 14:24:02 UTC] {taskinstance.py:1382} INFO - Executing <Task(CLIOperator): collect_CAS_table_metrics_public> on 2023-08-30 14:22:51.684136+00:00
[2023-08-30, 14:24:02 UTC] {standard_task_runner.py:57} INFO - Started process 2355015 to run task
[2023-08-30, 14:24:02 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'prj_ck_cas_table_monitoring', 'collect_CAS_table_metrics_public', 'manual__2023-08-30T14:22:51.684136+00:00', '--job-id', '2428156', '--raw', '--subdir', 'DAGS_FOLDER/prj_ck_cas_table_monitoring/prj_ck_cas_table_monitoring.py', '--cfg-path', '/tmp/tmpo_h27u9p']
[2023-08-30, 14:24:02 UTC] {standard_task_runner.py:85} INFO - Job 2428156: Subtask collect_CAS_table_metrics_public
[2023-08-30, 14:24:02 UTC] {cli_action_loggers.py:67} DEBUG - Calling callbacks: [<function default_action_log at 0x7f880eea13a0>]
[2023-08-30, 14:24:07 UTC] {taskinstance.py:1094} DEBUG - previous_execution_date was called
[2023-08-30, 14:24:08 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:08 UTC] {job.py:216} DEBUG - [heartbeat]
[2023-08-30, 14:24:11 UTC] {task_command.py:415} INFO - Running <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> on host OBSCURED_HOSTNAME
[2023-08-30, 14:24:11 UTC] {settings.py:353} DEBUG - Disposing DB connection pool (PID 2355015)
[2023-08-30, 14:24:11 UTC] {settings.py:212} DEBUG - Setting up DB connection pool (PID 2355015)
[2023-08-30, 14:24:11 UTC] {settings.py:285} DEBUG - settings.prepare_engine_args(): Using NullPool
[2023-08-30, 14:24:11 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:13 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:13 UTC] {job.py:216} DEBUG - [heartbeat]
[2023-08-30, 14:24:16 UTC] {taskinstance.py:1094} DEBUG - previous_execution_date was called
[2023-08-30, 14:24:18 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:18 UTC] {job.py:216} DEBUG - [heartbeat]
[2023-08-30, 14:24:20 UTC] {taskinstance.py:925} DEBUG - Clearing XCom data
[2023-08-30, 14:24:20 UTC] {retries.py:92} DEBUG - Running RenderedTaskInstanceFields.write with retries. Try 1 of 3
[2023-08-30, 14:24:20 UTC] {retries.py:92} DEBUG - Running RenderedTaskInstanceFields._do_delete_old_records with retries. Try 1 of 3
[2023-08-30, 14:24:20 UTC] {taskinstance.py:1660} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='CK' AIRFLOW_CTX_DAG_ID='prj_ck_cas_table_monitoring' AIRFLOW_CTX_TASK_ID='collect_CAS_table_metrics_public' AIRFLOW_CTX_EXECUTION_DATE='2023-08-30T14:22:51.684136+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-08-30T14:22:51.684136+00:00'
[2023-08-30, 14:24:20 UTC] {__init__.py:117} DEBUG - Preparing lineage inlets and outlets
[2023-08-30, 14:24:20 UTC] {__init__.py:158} DEBUG - inlets: [], outlets: []
[2023-08-30, 14:24:20 UTC] {logging_mixin.py:151} INFO - Execute cli task
[2023-08-30, 14:24:20 UTC] {base.py:73} INFO - Using connection ID 'sas_viya_conn' for task execution.
[2023-08-30, 14:24:20 UTC] {base.py:73} INFO - Using connection ID 'cxa_mssql_db_domain_user_format' for task execution.
[2023-08-30, 14:24:20 UTC] {logging_mixin.py:151} INFO - Environment variables: ['LANG', 'PATH', 'VIYA_USERNAME', 'VIYA_PASSWORD', 'VIYA_HOST_URL', 'MSSQL_USERNAME', 'MSSQL_PASSWORD', 'MSSQL_HOST_URL', 'MSSQL_HOST_PORT', 'MSSQL_DATABASE']
[2023-08-30, 14:24:20 UTC] {logging_mixin.py:151} INFO - Bash command: /srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_CLIENT_SECRET --monitoring_table sas_cas_table_status
[2023-08-30, 14:24:20 UTC] {bash.py:186} DEBUG - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='CK' AIRFLOW_CTX_DAG_ID='prj_ck_cas_table_monitoring' AIRFLOW_CTX_TASK_ID='collect_CAS_table_metrics_public' AIRFLOW_CTX_EXECUTION_DATE='2023-08-30T14:22:51.684136+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-08-30T14:22:51.684136+00:00'
[2023-08-30, 14:24:20 UTC] {subprocess.py:63} INFO - Tmp dir root location: /tmp
[2023-08-30, 14:24:20 UTC] {subprocess.py:75} INFO - Running command: ['/usr/bin/bash', '-c', '/srv/team-workspaces/ck/pex/viyacli.pex cas-monitoring --ssl_cert_file OBSCURED_SSL_CERT_PATH --viya_caslib PUBLIC --viya_client_secret OBSCURED_CLIENT_SECRET --monitoring_table sas_cas_table_status']
[2023-08-30, 14:24:20 UTC] {subprocess.py:86} INFO - Output:
[2023-08-30, 14:24:23 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:23 UTC] {job.py:216} DEBUG - [heartbeat]
[2023-08-30, 14:24:23 UTC] {subprocess.py:93} INFO - 2023-08-30 16:24:23.723 | WARNING | viya.mssql_utils:write_to_mssql:35 - No data was passed to insert into the Database.
[2023-08-30, 14:24:23 UTC] {subprocess.py:97} INFO - Command exited with return code 0
[2023-08-30, 14:24:23 UTC] {__init__.py:75} DEBUG - Lineage called with inlets: [], outlets: []
[2023-08-30, 14:24:23 UTC] {taskinstance.py:844} DEBUG - Refreshing TaskInstance <TaskInstance: prj_ck_cas_table_monitoring.collect_CAS_table_metrics_public manual__2023-08-30T14:22:51.684136+00:00 [running]> from DB
[2023-08-30, 14:24:23 UTC] {taskinstance.py:1458} DEBUG - Clearing next_method and next_kwargs.
[2023-08-30, 14:24:23 UTC] {taskinstance.py:1400} INFO - Marking task as SUCCESS. dag_id=prj_ck_cas_table_monitoring, task_id=collect_CAS_table_metrics_public, execution_date=20230830T142251, start_date=20230830T142402, end_date=20230830T142423
[2023-08-30, 14:24:23 UTC] {taskinstance.py:2436} DEBUG - Task Duration set to 20.913392
[2023-08-30, 14:24:23 UTC] {cli_action_loggers.py:85} DEBUG - Calling callbacks: []
[2023-08-30, 14:24:23 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
[2023-08-30, 14:24:23 UTC] {dagrun.py:740} DEBUG - number of tis tasks for <DagRun prj_ck_cas_table_monitoring @ 2023-08-30 14:22:51.684136+00:00: manual__2023-08-30T14:22:51.684136+00:00, state:running, queued_at: 2023-08-30 16:22:51.722581+00:00. externally triggered: True>: 0 task(s)
[2023-08-30, 14:24:23 UTC] {taskinstance.py:2784} INFO - 0 downstream tasks scheduled from follow-on schedule check |
Like other users above, when upgrading I also noticed major increases to time taken to execute my dag_run that was previously taking < 1 minute to execute (scheduled every minute). It led to a cascade of unfinished dag_runs across this dag, and subsequently all my other dags were affected. Like @raphaelsimeon, the performance insights from my RDS instance showed that the query to get the DAG history was taking a long time to return. I also noticed that each running dag_run triggers a query to get this data (all entries before the current dag_run), and also re-requests this query at multiple points while the DAG is running. This seemed to be the root of the issue as even when the first query does not return in a timely manner, a subsequent one is automatically triggered by the scheduler and so on, until there is a log-jam of pending queries (until the max_active_runs is reached) which affects database performance for every other process and DAG. For context I run Experiment 1: Experiment 2: So I have found a work-around for my particular case by deleting rows from the dag_run metadata table. I would be interested to know what kind of state the dag_run tables are for the users who have posted above, and whether they are performing any scheduled maintenance on those tables? |
@nick-msk-ai you mentioned that the query to get the DAG history was taking a long time to return. Can you please paste the exact query that you mentioned here, so that we can see if there were any recent changes to that query |
For a 10 minute interval DAG run scheduled at I'm not 100% but I think the query itself must be being dynamically generated in |
@nick-msk-ai actually, performance insights in RDS don't seem to show very high latency as you suggested. Slowest query looks like it has 100ms latency on average, which does not sound overwhelming, does it ? |
This query fixed in |
@jaetma @raphaelsimeon @nick-msk-ai @Github-dm-CDE could you check if the task's execution become more reliable and fast in Airflow 2.7.1 if compare with 2.7.0 |
@Taragolis,
|
Can others here confirm it? I am going to close that one as it seems that the fix is working, buth having others confirming it would be great. |
I can confirm that. The update from 2.7.0 to 2.7.1 seems to have solved the problem. |
I can confirm my kubernetes instance with 2.7.1 does not have that problem anymore, upgraded from 2.6.3 |
Thanks everyone. Great job @Taragolis on getting to the bottom of it and fixing it :). That was pretty remote from what I would expect could be the reason ;) |
Late to the party here, but same for us, upgrade to 2.7.1 did the trick. Thanks all for having a look :) |
Hello everyone, just got to this thread because I'm facing same issue on AF 2.7.2. I'm using this version because this is the latest version available on AWS MWAA. I'm experincing a huge amount of time on simple tasks that used to take seconds to run. |
@BenoCharlo Do you have a local Airflow env setup? If yes, could you try running your DAGs locally with Airflow 2.7.2 vs your previous version and see if you are able to reproduce it locally? Would be nice to check first that it is not an infra issue on your cloud deployment. |
@pankajkoti yes, I have a local setup with mwaa-local-runner. My previous version was AF 2.2.2 (a bit outdated). I do experience this issue with the local setup as well |
Thanks @BenoCharlo . Since you've a local setup, would be able to help more with some more testing. Maybe instead of directly upgrading from 2.2.2 to 2.7.2, can you try upgrading to 2.6.3 also and check if you observe such a slowness with 2.6.3 too? Previously, it was observed that the perf decreased between 2.6.3 and 2.7. Also, would you be able to share some insights on what your DAGs are doing. Or if you could share a small reproducible DAG that causes this perf degrade would be nice too. |
@pankajkoti I've tried the 2.6.3 version today. A lot of dags are starting to run as expected. It has gotten a bit better but I am still experiencing the same delay problem for some dags. I have some tasks that delay 1 out 2 runs. The delays are actually the task never ending. I am suspecting writing to s3 bucket using pandas causing this issue. Thanks for the workaround. |
Discussed in #33664
Originally posted by jaetma August 23, 2023
Hi community!
We have been using Airflow quite a long time, and right after updating from version 2.6.3 to 2.7.0, the running time increased extremely high. Tasks that used to take 15 seconds to complete now are taking 10 minutes!
This is problematic because there are more tasks being queued than those that are finished.
We've detected this issue in 3 projects running with Airflow, across 2 instances in Kubernetes and 1 with Docker.
Illustrating image:
The text was updated successfully, but these errors were encountered: