Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change logging level details of connection info in get_connection() #21162

Merged
merged 2 commits into from
Feb 15, 2022

Conversation

josh-fell
Copy link
Contributor

Related: #19883

Currently task logs can contain all of connection details depending on how the associated connection to the task is configured (i.e. if host is a provided connection attr). These details are logged at the INFO level but seem more appropriate for debugging.

This PR intends to clean up this connection logging a little. The INFO level logging will contain only the connection ID that is used while the details of the connection are changed to the DEBUG level (and still masked). Additionally the connection ID info is logged regardless of the provided connection attrs (i.e. removing the host check). Lastly this change also has a small added benefit of not accidentally or unknowingly exposing connection info that users do not want in their logs first rather than the details be exposed and then having to setup configuration to mask them later (assuming the exposure is noticed at all).


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Jan 27, 2022
@josh-fell josh-fell force-pushed the connection-info-logging branch from df63357 to f899932 Compare January 27, 2022 17:42
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 27, 2022
@josh-fell josh-fell force-pushed the connection-info-logging branch from f899932 to 13da255 Compare January 29, 2022 02:30
@potiuk potiuk closed this Jan 29, 2022
@potiuk potiuk reopened this Jan 29, 2022
@josh-fell
Copy link
Contributor Author

Is it expected that the Providers tests in the "Tests: Always API Core Other CLI Providers Integration" suite doesn't run for MySQL and MSSQL? I was surprised that only Postgres and Sqlite failed.

image

@josh-fell josh-fell force-pushed the connection-info-logging branch from 13da255 to c2ccaf7 Compare January 31, 2022 16:26
@potiuk
Copy link
Member

potiuk commented Jan 31, 2022

Is it expected that the Providers tests in the "Tests: Always API Core Other CLI Providers Integration" suite doesn't run for MySQL and MSSQL? I was surprised that only Postgres and Sqlite failed.

Yes. That was done as part of stabilizing our flaky CI Tests.

Both MySQL and MSSQL (despite very aggressive optimisation of the configuration of the dockerized versions of those) require much more memory to run than Postgres and SQLite. That lead to Jobs failing quite often with Exit code 137 (means memory run out) when they were run on Public Runners. That's why this type of tests is disabled now for those two databases (but only on Public Runners).

In fact, it is actually even printed there. If you unfold "Determine how to run the tests" in those tests you will see this:

image

You will find the logic controlling it here

function run_all_test_types_in_parallel() {

Also "all" tests will run in "main" after the change is merged. Those tests are run on our 64GB mem self-hosted machines, that have enough CPUS and memory to run all the tests always in parallell. So we will see failing main in case those "Providers" tests run on MySQL or MsSQL woudl fail (which is highly unlikely because Provider tests are not supposed to be "Metadata-DB" dependent. Yet we "just in case" always run all tests in main.

@potiuk
Copy link
Member

potiuk commented Jan 31, 2022

The same tests on self-hosted runners look like that:

image

@josh-fell
Copy link
Contributor Author

Oh that makes perfect sense. Thanks for all the context @potiuk 🚀

@potiuk
Copy link
Member

potiuk commented Jan 31, 2022

Oh that makes perfect sense. Thanks for all the context @potiuk 🚀

No problem. I am writing all those "Architecture decision records" in https://github.com/apache/airflow/tree/main/dev/breeze/doc/adr - during the Breeze2 rewrite project, so this is actually a good next one to add.

@josh-fell josh-fell force-pushed the connection-info-logging branch from c2ccaf7 to 8a20191 Compare February 8, 2022 04:30
@potiuk
Copy link
Member

potiuk commented Feb 15, 2022

Just old docker-compose problem already fixed. Merging

@potiuk potiuk merged commit 00f0025 into apache:main Feb 15, 2022
@josh-fell josh-fell deleted the connection-info-logging branch February 15, 2022 17:31
@jedcunningham jedcunningham added the type:improvement Changelog: Improvements label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow full tests needed We need to run full set of tests for this PR to merge type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants