-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attaching a logGroomerSidecar to dag processor #30623
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
I think scheduler/* logs are mainly because standalone DAGFileProcessor has been extracted from scheduler (normally it runs there). That might be a good idea (@mhenc?) to possibly update the default log configuration so that in case dag file processor is run as standalone, the logs are created in a different folder. Not a big issue, but might be a good idea to improve it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test coverage? Follow this example.
Right :) |
Thanks for the example! I've added the test code. |
@potiuk |
I believe it leaves logs there - but possibly they are much smaller and not that annoying. The feature to remove local logs automatically when you use remote logging has been just recently merged and is being released (withing next 2 weeks) as part of 2.6.0 #29772 |
Ah my mistake. I think local file deletion was implemented before (but individually for ES task handler only), The thing is that dag file processor logs are NOT logged remotly by default - anly "task" logs are. Scheduler/dag file processor logs are stored locally by default and you need to separtely configure them using Advanced configuration https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html#advanced-configuration if you want to send them to a remote logging system. |
After a series of unit test failures, I realized that in LogGroomerTest, the |
Also I moved the test file as part of speed optimization, so you will have to move it too - after rebase (you will have conflict) |
* Remove gauge scheduler.tasks.running * Add significant.rst file * Update newsfragments/30374.significant.rst --------- Co-authored-by: Niko Oliveira <[email protected]>
…atest `resource_version` (apache#30425) * Recover from `too old resource version exception` by retrieving the latest `resource_version` * Update airflow/executors/kubernetes_executor.py Co-authored-by: Tzu-ping Chung <[email protected]> --------- Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: eladkal <[email protected]>
* readding after borked it * pre-commit * finally fixing after the github issue last week * push fix * feedback from hussein
The new moto (4.1.7) performs additional validation on the queues created during tests and it failes the tests when content deduplication is not specified. Explicit setting the deduplication mode, fixes the problem and allows the new moto to be installed.
* fix possible race condition when refreshing DAGs * merge the two queries into one * Remove provide_session from internal function Since get_latest_version_hash_and_updated_datetime is internal and we always pass in the session anyway, the provide_session decorator is redundant and only introduce possibility for developer errors. --------- Co-authored-by: Sébastien Brochet <[email protected]> Co-authored-by: Tzu-ping Chung <[email protected]>
In some cases when the machine has been reused across builds, pipx installed twine might seem both installed and removed (this happens when builds are cancelled while installing twine. Installing twine with --force should fix the problem.
…he#30682) When packages for "Other" test type are calculated, the list of all test folders is generated and they are compared with the packages previously selected by the "predefined" test types. This is done via `find` method that returns the folders in arbitrary order, mostly depending on the sequence the folders were created. In case the tests from some packages have some side-effects that impact tests in other packages (obviously not something that is desired), this might end up that the tests succeed in one environment, but fail in another. This happened for example in case of apache#30362 that had cross-package side-effect later fixed in apache#30588. There - results of "Other" test type depended on where the tests were executed. This PR sorts the find output so it is always in consistent order. we are using ASCII for package names and the test types are derived in the same Docker CI image with the same LOCALE, so it should guarantee that the output of packages for "Other" test type should be always consistent.
Upgrading to latest (released a week ago) MyPy in the hopes it will fix some more problem with attrs after upgrading new packages, but it seems that even the latest MyPy does not know about the new typing changes introduced in attrs (traditionally mypy has attrs plugin that injects appropriate typing but apparently it needs to catch up with those changes.
Helm Unit tests are using template rendering and the rendering uses a lot of CPU for `helm template command`. We have a lot of those rendering tests (>800) so even running the tests in parallel on a multi-cpu machine does not lead to a decreased elapsed time to execute the tests. However, each of the tests is run entirely independently and we should be able to achieve much faster elapsed time if we run a subset of tetsts on separate, multi-CPU machine. This will not lower the job build time, however it might speed up elapsed time and thus get a faster feedback. This PR achieves that.
…de (apache#30690) * Add a new argument to rais skip exception when the python callable exit with the same value * add unit tests for skip_exit_code
…le codes (apache#30692) * rename skip_exit_code to skip_on_exit_code and allow providing multiple codes * replace list type by Container
Co-authored-by: Mark Richman <[email protected]>
This change separates out the policies we have for providers to a separate PROVIERS.rst file. It also documents clearly the process and policy we have for accepting new community-managed providers, explaining the conditions that have to be fulfilled and stating a very strong preference of keeping providers maintained by the 3rd-party providers when there are 3rd-party teams that manage the providers.
oh sorry for noise. my mistake :( |
I accidentally rebased the commit, sorry. |
Closing this one then |
Description
This PR attaches the logGroomer sidecar pod when using a standalone
dag processor
. This is to prevent scheduler logs from growing infinitely in thelogs
directory of thatdag processor
pod.In fact, one of the Airflow clusters my team uses had about
3.5 TiB
of scheduler logs accumulated in the emptyDir of adag processor
pod, which reduced the ephemeral-storage availability on that node to the point of pod eviction, resulting in a pod eviction.We haven't figured out why the standalone
dag processor
was accumulating scheduler logs, but we think it's a good idea to attach a logGroomer sidecar like any other scheduler pod or worker pod to prevent this from happening in the first place.In this PR, we've modified the helm chart to attach a logGroomer to the standalone
dag processor
.Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.