Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking configuration values irrelevant to DAG author #43040

Merged
merged 16 commits into from
Oct 23, 2024

Conversation

amoghrajesh
Copy link
Contributor

Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/settings.py Outdated Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Oct 15, 2024

Just adding a unit test, and I think it's ready to go :)

@amoghrajesh
Copy link
Contributor Author

Struggling with some test related setup. I am running into this as of now:

../../tests_common/test_utils/db.py:23: in <module>
    from airflow.models import (
../../airflow/models/__init__.py:78: in __getattr__
    val = import_string(f"{path}.{name}")
../../airflow/utils/module_loading.py:39: in import_string
    module = import_module(module_path)
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
../../airflow/models/dag.py:95: in <module>
    from airflow.models.abstractoperator import AbstractOperator, TaskStateChangeCallback
../../airflow/models/abstractoperator.py:33: in <module>
    from airflow.template.templater import Templater
../../airflow/template/templater.py:23: in <module>
    from airflow.io.path import ObjectStoragePath
../../airflow/io/__init__.py:30: in <module>
    from airflow.providers_manager import ProvidersManager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    """Manages all providers."""
    
    from __future__ import annotations
    
    import fnmatch
    import functools
    import inspect
    import json
    import logging
    import os
    import sys
    import traceback
    import warnings
    from dataclasses import dataclass
    from functools import wraps
    from time import perf_counter
    from typing import TYPE_CHECKING, Any, Callable, MutableMapping, NamedTuple, TypeVar
    
    from packaging.utils import canonicalize_name
    
    from airflow.exceptions import AirflowOptionalProviderFeatureException
>   from airflow.providers.standard.hooks.filesystem import FSHook
E   ModuleNotFoundError: No module named 'airflow.providers.standard.hooks'

I see some discussions around, does anyone have any suggestions for this? I use Pycharm

@potiuk
Copy link
Member

potiuk commented Oct 16, 2024

I see some discussions around, does anyone have any suggestions for this? I use Pycharm

I believe (I have not yet had time to look at it) this requires to use uv workspace and the latest uv installed - @ashb - are there any instructions for that already somewhere ?

I guess those should be updated ?

@potiuk
Copy link
Member

potiuk commented Oct 16, 2024

I think we generally should provide rather comprehensive guide (at least with some links) to our contributors how to setup the venv now (Breeze is covered because it uses workspace installation internally I believe)? Am I right @ashb ?

airflow/settings.py Outdated Show resolved Hide resolved
tests/core/test_settings.py Outdated Show resolved Hide resolved
airflow/configuration.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Oct 16, 2024

Breeze is all set up yes. UV workspace isn't 100% required yet (though I don't think anyone tested running tests from inside pycharm), but for pycharm to find imports you need to set up some paths as the right kind of folder in the UI

I'm not a pycharm user, but @kaxil @dstandish can provide some insight.

airflow/settings.py Outdated Show resolved Hide resolved
airflow/settings.py Outdated Show resolved Hide resolved
airflow/settings.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Oct 16, 2024

#42951 should help Pycharm too. Remaining "manual" steps as pycharm user are:

  1. We marked providers/src as Source Folder and providers/src/airflow as Namespace package.
  2. Verify that there is nothing on old provider path: rm -rf airflow/providers 🙂
  3. Trigger Rescan Project Indexes from File -> Repair IDE.

@amoghrajesh
Copy link
Contributor Author

I just pulled in #42951, it doesn't seem to fix it for me. I still keep getting


    """Manages all providers."""
    
    from __future__ import annotations
    
    import fnmatch
    import functools
    import inspect
    import json
    import logging
    import os
    import sys
    import traceback
    import warnings
    from dataclasses import dataclass
    from functools import wraps
    from time import perf_counter
    from typing import TYPE_CHECKING, Any, Callable, MutableMapping, NamedTuple, TypeVar
    
    from packaging.utils import canonicalize_name
    
    from airflow.exceptions import AirflowOptionalProviderFeatureException
>   from airflow.providers.standard.hooks.filesystem import FSHook
E   ModuleNotFoundError: No module named 'airflow.providers.standard'

@dstandish
Copy link
Contributor

I'm not a pycharm user, but @kaxil @dstandish can provide some insight.

Anything in particular you wanted me to look at?

@amoghrajesh
Copy link
Contributor Author

@ashb @potiuk @kaxil I changed the logic to incorporate what was suggested above. Now I am able to get this result:

[2024-10-16, 18:39:13 IST] {logging_mixin.py:190} INFO - environ({'SHELL': '/bin/bash', 'AIRFLOW__SMTP__SMTP_PORT': '587', 'VERSION_SUFFIX_FOR_PYPI': '', 'NUM_RUNS': '', 'AIRFLOW__CELERY__RESULT_BACKEND': 'db+***', 'AIRFLOW_AUTH_MANAGER_CREDENTIAL_DIRECTORY': '/files', 'USE_AIRFLOW_VERSION': '', 'AIRFLOW__SMTP__SMTP_PASSWORD': 'dummy_smtp_password', 'AIRFLOW__SMTP__SMTP_USER': 'dummy_user', 'DB_RESET': 'false', 'TERM_PROGRAM_VERSION': '3.3a', 'CASS_DRIVER_NO_CYTHON': '1', 'DOCKER_IS_ROOTLESS': 'false', 'TMUX': '/root/.tmux/tmp/tmux-0/default,96,0', 'DEPENDENCIES_EPOCH_NUMBER': '11', 'HOSTNAME': '22a3f86e4c95', 'COLOR_RESET': '\x1b[0m', 'AIRFLOW__WEBSERVER__SECRET_KEY_SECRET': '***_secret', 'PYTHON_VERSION': '3.9.20', 'LANGUAGE': 'C.UTF-8', 'HOST_USER_ID': '20', 'FASTAPI_API_HOST_PORT': '29091', 'USE_PACKAGES_FROM_DIST': 'false', 'COLLECT_ONLY': 'false', 'PACKAGE_FORMAT': 'wheel', 'AIRFLOW__CORE__PLUGINS_FOLDER': '/files/plugins', 'AIRFLOW_CI_BUILD_EPOCH': '10', 'CHICKEN_EGG_PROVIDERS': '', 'MSSQL_HOST_PORT': '21433', 'SQLALCHEMY_WARN_20': 'true', 'CELERY_BROKER_URLS_MAP': "{'rabbitmq': 'amqp://guest:guest@rabbitmq:5672', 'redis': 'redis://redis:6379/0'}", 'AIRFLOW_CONSTRAINTS_LOCATION': '', 'DEV_MODE': 'false', 'AIRFLOW__CORE__DAGS_FOLDER': '/files/dags', 'COLOR_RED': '\x1b[31m', 'RUN_SYSTEM_TESTS': 'false', 'COLOR_BLUE': '\x1b[34m', 'AIRFLOW_ENABLE_AIP_44': 'true', 'AIRFLOW_CI_IMAGE': 'ghcr.io/apache/airflow/main/ci/python3.9', 'TEST_TYPE': '', 'ADDITIONAL_PATH': '~/.local/bin', 'EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS': '', 'PROVIDERS_SKIP_CONSTRAINTS': 'false', 'BREEZE_INIT_COMMAND': '', 'PWD': '/opt/airflow', 'AIRFLOW_SKIP_CONSTRAINTS': 'false', 'AIRFLOW__CELERY__BROKER_URL': 'redis://redis:6379/0', 'REMOVE_ARM_PACKAGES': 'false', 'AIRFLOW_VERSION': '3.0.0.dev0', 'ENABLE_COVERAGE': 'false', 'VERBOSE_COMMANDS': 'false', 'AIRFLOW__CORE__LOAD_EXAMPLES': 'false', 'GITHUB_ACTIONS': 'false', 'INSTALL_MSSQL_CLIENT': 'true', 'CASS_DRIVER_BUILD_CONCURRENCY': '8', 'CI_EVENT_TYPE': 'pull_request', 'UPGRADE_IF_NEEDED': '--upgrade', 'INSTALL_MYSQL_CLIENT_TYPE': 'mariadb', 'BACKEND': 'mysql', 'CI_TARGET_BRANCH': 'main', 'PROVIDERS_CONSTRAINTS_MODE': 'constraints-source-providers', 'GUNICORN_CMD_ARGS': '--worker-tmp-dir /dev/shm/', 'AIRFLOW_EXTRAS': '', 'LD_PRELOAD': '/usr/lib/aarch64-linux-gnu/libstdc++.so.6', 'PIP_PROGRESS_BAR': 'on', 'AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET': 'dummy_sql_alchemy_conn', 'HOME': '/root', 'QUIET': 'false', 'UV_NO_CACHE': 'true', 'LANG': 'C.UTF-8', 'SUSPENDED_PROVIDERS_FOLDERS': '', 'HELM_TEST_PACKAGE': '', 'TMUX_TMPDIR': '/root/.tmux/tmp', 'SQLITE_URL': 'sqlite:////root/airflow/sqlite/airflow.db', 'ISSUE_ID': '', 'AIRFLOW_HOME': '/root/airflow', 'MYSQL_VERSION': '8.0', 'AIRFLOW_BREEZE_CONFIG_DIR': '/files/airflow-breeze-config', 'POSTGRES_HOST_PORT': '25433', 'EXTRA_UNINSTALL_FLAGS': '--python /usr/local/bin/python', 'AIRFLOW_VERSION_SPECIFICATION': '', 'GPG_KEY': 'E3FF2839C048B25C084DEBE9B26995E310250568', 'AIRFLOW__DATABASE__SQL_ALCHEMY_CONN': '***', 'AIRFLOW_USE_UV': 'true', 'AIRFLOW_PRE_CACHED_PIP_PACKAGES': 'true', 'ADDITIONAL_DEV_APT_DEPS': 'bash-completion dumb-init git graphviz krb5-user less libenchant-2-2 libgcc-11-dev libgeos-dev libpq-dev net-tools netcat-openbsd openssh-server postgresql-client software-properties-common rsync tmux unzip vim xxd', 'COLOR_YELLOW': '\x1b[33m', 'ONLY_MIN_VERSION_UPDATE': 'false', 'AIRFLOW__CELERY__WORKER_CONCURRENCY': '8', 'STANDALONE_DAG_PROCESSOR': 'false', 'AIRFLOW__CORE__EXECUTOR': 'LocalExecutor', 'INSTALL_SELECTED_PROVIDERS': '', 'USE_XDIST': 'false', 'COMMIT_SHA': 'c03ccfc0716fdbf88eb778c920767b4d3953ac5b', 'CONSTRAINTS_GITHUB_REPOSITORY': 'apache/airflow', 'AIRFLOW__DATABASE__SQL_ALCHEMY_ENGINE_ARGS_SECRET': 'dummy_sql_engine_args', 'EXTRA_INSTALL_FLAGS': '--python /usr/local/bin/python', 'AIRFLOW_CI_IMAGE_WITH_TAG': 'ghcr.io/apache/airflow/main/ci/python3.9:latest', 'SKIP_SSH_SETUP': 'false', 'TMUX_SESSION': 'Airflow', 'AIRFLOW_REPO': 'apache/airflow', 'DEV_APT_COMMAND': '', 'AIRFLOW__WEBSERVER__SECRET_KEY': '***', 'AIRFLOW_PIP_VERSION': '24.2', 'ADDITIONAL_PIP_INSTALL_FLAGS': '', 'SKIP_ENVIRONMENT_INITIALIZATION': 'false', 'ADDITIONAL_DEV_APT_COMMAND': '', 'AIRFLOW_ENV': 'development', 'START_AIRFLOW': 'true', 'COLOR_GREEN': '\x1b[32m', 'AIRFLOW_BRANCH': 'main', 'CLEAN_AIRFLOW_INSTALLATION': 'false', 'FORCE_LOWEST_DEPENDENCIES': 'false', 'AIRFLOW__CORE__INTERNAL_API_SECRET_KEY_SECRET': '***_secret', 'HOST_GROUP_ID': '20', 'DEFAULT_BRANCH': 'main', 'INIT_SCRIPT_FILE': 'init.sh', 'AIRFLOW_CONSTRAINTS_MODE': 'constraints-source-providers', 'PYTHONPATH': '/opt/airflow', 'TERM': 'tmux-256color', 'UPGRADE_BOTO': 'false', 'DOWNGRADE_PENDULUM': 'false', 'USER': 'root', 'HOST_OS': 'darwin', 'USE_UV': 'true', 'TMUX_PANE': '%1', 'WEBSERVER_HOST_PORT': '28080', 'AIRFLOW_UV_VERSION': '0.4.22', 'AIRFLOW__CORE__INTERNAL_API_SECRET_KEY': '***', 'INSTALL_POSTGRES_CLIENT': 'true', 'COMPOSE_FILE': '/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/base.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/docker-socket.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/backend-mysql.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/backend-mysql-port.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/files.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/base-ports.yml:/Users/adesai/Documents/OSS/airflow-repos/airflow/scripts/ci/docker-compose/local.yml', 'SHLVL': '2', 'AIRFLOW_CONSTRAINTS_REFERENCE': 'constraints-main', 'DEFAULT_CONSTRAINTS_BRANCH': 'constraints-main', 'FILES_DIR': '/files', 'CI_BUILD_ID': '0', 'INSTALL_AIRFLOW_VERSION': '', 'LC_MESSAGES': 'C.UTF-8', 'SYSTEM_TESTS_ENV_ID': '', 'MOUNT_SOURCES': 'selected', 'PYTHONDONTWRITEBYTECODE': 'true', 'KUBECONFIG': '/files/.kube/config', 'UPGRADE_INVALIDATION_STRING': '', 'MYSQL_HOST_PORT': '23306', 'LC_CTYPE': 'C.UTF-8', 'AIRFLOW_IMAGE_KUBERNETES': 'ghcr.io/apache/airflow/main/kubernetes/python3.9', 'ANSWER': '', 'RUN_TESTS': 'false', 'LOAD_EXAMPLES': 'false', '_AIRFLOW_SKIP_DB_TESTS': 'false', 'DOWNGRADE_SQLALCHEMY': 'false', 'AIRFLOW_INSTALLATION_METHOD': '.', 'REDIS_HOST_PORT': '26379', 'LC_ALL': 'C.UTF-8', 'CI_TARGET_REPO': 'apache/airflow', 'CI_JOB_ID': '0', 'DATABASE_ISOLATION': 'false', 'INSTALL_MYSQL_CLIENT': 'true', 'SSH_PORT': '12322', 'PATH': '/files/bin/:/opt/airflow/scripts/in_container/bin/:/root/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/airflow', 'PYTHON_MAJOR_MINOR_VERSION': '3.9', '_AIRFLOW_RUN_DB_TESTS_ONLY': 'false', 'AIRFLOW__CORE__DATASET_MANAGER_KWARGS_SECRET': 'dummy_dataset_manager_kwargs', 'UPGRADE_EAGERLY': '--upgrade --resolution highest', 'CELERY_FLOWER': 'false', 'INSTALL_AIRFLOW_WITH_CONSTRAINTS': 'true', 'CI': 'false', 'PACKAGING_TOOL': '', 'PACKAGING_TOOL_CMD': 'uv pip', 'LOAD_DEFAULT_CONNECTIONS': 'false', 'FLOWER_HOST_PORT': '25555', 'PIP_NO_CACHE_DIR': 'true', 'BASE_BRANCH': 'main', 'PYTHON_BASE_IMAGE': 'python:3.9-slim-bookworm', 'ENABLED_SYSTEMS': '', 'AIRFLOW__LOGGING__REMOTE_TASK_HANDLER_KWARGS_SECRET': 'dummy_remote_task_handler_kwargs', 'AIRFLOW__SMTP__SMTP_PASSWORD_SECRET': 'dummy_smtp_password_secret', 'UV_HTTP_TIMEOUT': '300', 'BREEZE': 'true', 'DRILL_HOST_PORT': '28047', 'BUILD_ID': '0', 'AIRFLOW__SENTRY__SENTRY_DSN_SECRET': 'dummy_sentry_dsn', 'AIRFLOW__CORE__FERNET_KEY': '***', 'POSTGRES_VERSION': '12', 'DEBIAN_FRONTEND': 'noninteractive', 'EXCLUDED_PROVIDERS': '', 'UV_CONCURRENT_DOWNLOADS': '10', 'OLDPWD': '/opt/airflow', 'AIRFLOW__SMTP__SMTP_SSL': 'True', 'AIRFLOW_SOURCES': '/opt/airflow', 'VERBOSE': 'false', 'TERM_PROGRAM': 'tmux', 'AIRFLOW__SMTP__SMTP_HOST': 'smtp.dummy.com', 'REGENERATE_MISSING_DOCS': 'false', '_': '/usr/local/bin/airflow', '_AIRFLOW_PARSING_CONTEXT_DAG_ID': 'os-env-variable', '_AIRFLOW_PARSING_CONTEXT_TASK_ID': 'test_me', 'AIRFLOW_CTX_DAG_OWNER': 'airflow', 'AIRFLOW_CTX_DAG_ID': 'os-env-variable', 'AIRFLOW_CTX_TASK_ID': 'test_me', 'AIRFLOW_CTX_EXECUTION_DATE': '2024-10-16T13:09:12.855444+00:00', 'AIRFLOW_CTX_TRY_NUMBER': '1', 'AIRFLOW_CTX_DAG_RUN_ID': 'manual__2024-10-16T13:09:12.855444+00:00'})

From my original list, these aren't masked:

    "AIRFLOW__SMTP__SMTP_PORT",
    "AIRFLOW__SMTP__SMTP_PASSWORD",
    "AIRFLOW__SMTP__SMTP_PASSWORD_SECRET",
    "AIRFLOW__SMTP__SMTP_USER",
    "AIRFLOW__SMTP__SMTP_SSL",
    "AIRFLOW__SMTP__SMTP_HOST",
    "AIRFLOW__SENTRY__SENTRY_DSN_SECRET",
    "AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET",
    "AIRFLOW__DATABASE__SQL_ALCHEMY_ENGINE_ARGS_SECRET",
    "AIRFLOW__CORE__DATASET_MANAGER_KWARGS_SECRET",
    "AIRFLOW__LOGGING__REMOTE_TASK_HANDLER_KWARGS_SECRET",

As per the comments above, looks like AIRFLOW__SMTP__SMTP_PORT, AIRFLOW__SMTP__SMTP_USER (removed), AIRFLOW__SMTP__SMTP_PASSWORD (removed), arent relevant.

I think:

  • AIRFLOW__SMTP__SMTP_USER
  • AIRFLOW__SMTP__SMTP_SSL
  • AIRFLOW__SMTP__SMTP_HOST

Are ok to be unmasked. That leaves us with:

    "AIRFLOW__SENTRY__SENTRY_DSN_SECRET",
    "AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET",
    "AIRFLOW__DATABASE__SQL_ALCHEMY_ENGINE_ARGS_SECRET",
    "AIRFLOW__CORE__DATASET_MANAGER_KWARGS_SECRET",
    "AIRFLOW__LOGGING__REMOTE_TASK_HANDLER_KWARGS_SECRET",

Should these be masked or its alright to keep them unmasked?

@ashb
Copy link
Member

ashb commented Oct 16, 2024

Should these be masked or its alright to keep them unmasked?

#43040 (comment)

@amoghrajesh
Copy link
Contributor Author

Okay, in that case, this should be ok

@kaxil
Copy link
Member

kaxil commented Oct 16, 2024

I just pulled in #42951, it doesn't seem to fix it for me. I still keep getting


    """Manages all providers."""
    
    from __future__ import annotations
    
    import fnmatch
    import functools
    import inspect
    import json
    import logging
    import os
    import sys
    import traceback
    import warnings
    from dataclasses import dataclass
    from functools import wraps
    from time import perf_counter
    from typing import TYPE_CHECKING, Any, Callable, MutableMapping, NamedTuple, TypeVar
    
    from packaging.utils import canonicalize_name
    
    from airflow.exceptions import AirflowOptionalProviderFeatureException
>   from airflow.providers.standard.hooks.filesystem import FSHook
E   ModuleNotFoundError: No module named 'airflow.providers.standard'

Fixed in #43082

airflow/configuration.py Outdated Show resolved Hide resolved
tests/core/test_configuration.py Outdated Show resolved Hide resolved
tests/core/test_settings.py Outdated Show resolved Hide resolved
@amoghrajesh
Copy link
Contributor Author

@ashb does it look good to go? Got a green CI

@potiuk
Copy link
Member

potiuk commented Oct 23, 2024

Looks like all the comments are resolved and address by @amoghrajesh . LGTM. @kaxil @ashb - are you ok with merging it?

@potiuk potiuk added this to the Airflow 2.10.3 milestone Oct 23, 2024
@ashb
Copy link
Member

ashb commented Oct 23, 2024

Computer says no

Screenshot_20241023-164334.png

But yes, good to merge

@kaxil kaxil merged commit 0b030c5 into apache:main Oct 23, 2024
52 checks passed
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.


Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
@kaxil
Copy link
Member

kaxil commented Oct 23, 2024

This change has caused the @suppress_logs_and_warning decorator to not work for CLI commands and hence CLI commands now show warnings. Example: #43334

@kaxil
Copy link
Member

kaxil commented Oct 23, 2024

Fix here: #43335

potiuk pushed a commit to potiuk/airflow that referenced this pull request Oct 23, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.

Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 0b030c5)
potiuk added a commit that referenced this pull request Oct 24, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.

Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 0b030c5)

Co-authored-by: Amogh Desai <[email protected]>
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.


Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
@utkarsharma2 utkarsharma2 added the type:bug-fix Changelog: Bug Fixes label Oct 24, 2024
@saurabhb-dev
Copy link
Contributor

@amoghrajesh , shouldn't the list of sensitive_config_values be updated with the list of keys that need to redact ?
Line: https://github.com/amoghrajesh/airflow/blob/840ea3efb9533837e9f36b75fa527a0fbafeb23a/airflow/configuration.py#L121

@potiuk
Copy link
Member

potiuk commented Oct 24, 2024

@amoghrajesh , shouldn't the list of sensitive_config_values be updated with the list of keys that need to redact ? Line: https://github.com/amoghrajesh/airflow/blob/840ea3efb9533837e9f36b75fa527a0fbafeb23a/airflow/configuration.py#L121

Not really @saurabhb-dev , It's even better to mask values in this case, rather than keys.

The secrets_masker works in two modes:

  • for known structures it can mask sensitive key values (for example if a key in connection is "password" - it will mask the value when it prints the structure (when it is aware of the structure and acts on it)

  • but this is impossible when you log messages when the structures or values have been already converted to log string - by the time secrets masker in logger (installed as filter) gets the message, the structure is gone already (converted to string representation) and we are not aware that particular key corresponds to particular value. So our secrets masker has the second mode - where it can mask specific values. It will scan the string for all the values that are registered upfront and mask them if it finds any of them in the string.

This is what happens here - we retrieve all the secret config values (whether they come by env vars or by other means) and we add values (i.e. actual secrets) to be masked. This way when any secret is printed anywhere where secrets_masker is used, it will automatically mask it - regardless if it is a structure (secrets_masker checks values of dicts for example) or whether it's already converted to string.

utkarsharma2 pushed a commit that referenced this pull request Oct 24, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.

Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
(cherry picked from commit 0b030c5)

Co-authored-by: Amogh Desai <[email protected]>
@zachliu
Copy link
Contributor

zachliu commented Nov 7, 2024

figure i'd ask here before creating a new issue

for section, key in self.sensitive_config_values:
try:
value = self.get(section, key, suppress_warnings=True)

self.sensitive_config_values returns both ('database', 'sql_alchemy_conn') and ('core', 'sql_alchemy_conn')

hence the self.get() raises a warning:

airflow/configuration.py:859 FutureWarning: section/key [core/sql_alchemy_conn] has been deprecated, you should use[database/sql_alchemy_conn] instead. Please update your `conf.get*` call to use the new name

which is a bit confusing since i removed all references to core.sql_alchemy_conn long ago

@amoghrajesh
Copy link
Contributor Author

@zachliu we do not return ('database', 'sql_alchemy_conn') as part of self.sensitive_config_values. Here are the sensitive values: https://github.com/amoghrajesh/airflow/blob/840ea3efb9533837e9f36b75fa527a0fbafeb23a/airflow/configuration.py#L122-L129

The warning comes from configuration.py#919

@zachliu
Copy link
Contributor

zachliu commented Nov 7, 2024

@amoghrajesh i beg to differ

  1. the warning says configuration.py:859
  2. i tested with a print statement at these locations:
    • configuration.py:859
    • configuration.py:323

i'm on https://github.com/apache/airflow/tree/2.10.3

@zachliu
Copy link
Contributor

zachliu commented Nov 7, 2024

i think i know what's going on here. somehow 7dea23f (#42126) didn't make the cut for 2.10.3, which caused the self.sensitive_config_values to return both ('database', 'sql_alchemy_conn') and ('core', 'sql_alchemy_conn')

@amoghrajesh
Copy link
Contributor Author

@zachliu from the PR it doesnt look like it was targetted for 2.10.3.
I will leave a comment however

@zachliu
Copy link
Contributor

zachliu commented Nov 7, 2024

I created a new issue to track this: #43794

ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
Some configurations are irrelevant to DAG authors and hence we need to mask those to avoid it from getting logged unknowingly.


Co-authored-by: adesai <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants