Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating master from apache/airflow repo #1

Merged
merged 38 commits into from
May 5, 2021
Merged

Conversation

levyitay
Copy link
Owner

@levyitay levyitay commented May 5, 2021

No description provided.

ephraimbuddy and others added 30 commits April 30, 2021 20:37
This change adds test for the create user job and ensures changing of the uid is detected

Co-authored-by: Kaxil Naik <[email protected]>
Closes: #15374
This pull request follows #14776. 

Clearing a subdag with Downstream+Recursive does not automatically set the state of the parent dag so that the downstream parent tasks can execute.
While we generally suggest using Postgres, we should support using a
non-chart-provisioned mysql database as well.

closes: #15558
This PR builds off of and supersedes @jaydesl's work on his [PR](#11769) to move forward with properly following [helm's rbac best practices](https://helm.sh/docs/chart_best_practices/rbac/). This PR updates every potential pod that can be deployed to include the option to either create or use an existing service account. This is the first step towards supporting environments where users have the [PodSecurityPolicy](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podsecuritypolicy) admission controller enabled without forcing such users to provide any additional permissions to the default service account in the namespace this is deployed to.

closes: #11755
related: #13643 

Co-authored-by: jaydesl <[email protected]>
Co-authored-by: Ian Stanton <[email protected]>
Co-authored-by: Kaxil Naik <[email protected]>
Minor refactor of how the results backend connection secret is built. This moves logic out of an already long line, like it is already done in the metadata connection secret.
This PR updates changelog and bumps version of providers to be
released after we reached PIP 21 compatibility. It is necessary
for two reasons:

1) We need it in order to get constraints for PyPI released
   providers to be updated automatically (PIP 21 conflicts master
   airflow with few released providers

2) We want to release Airflow 2.0.3 which will be PIP 21 installable
   with those providers.
The image tagging now is fully automated within the build
dockerhub script including :<VERSION> and :latest tags.
Some of the tests were redundant in `kubernetes_tests/test_kubernetes_pod_operator_backcompat.py` which didn't test anything related to deprecated classes and were already covered by either `kubernetes_tests/test_kubernetes_pod_operator.py` or `tests/providers/cncf/kubernetes/operators/test_kubernetes_pod.py`
Some of the docs advised to use 'airflow[azure]' whereas it should be
apache-airflow[azure]
When building images for production we are using docker-context-files
where we build packages to install. However if those context files
are not cleaned up, they unnecessary increase size and time needed
to build image and they invalidate the COPY . layer of the image.

This PR checks if docker-context-files folder contains just readme
when Breeze build-image command is run (for cases where
images are not built from docker-context-files). Inversely it
also checks that there are some files in case the image is
built with --install-from-docker-context-files switch.

This PR also ads a --cleanup-docker-context-files switch to
clean-up the folder automatically. The error mesages also help
the user instructing the user what to do.
* Better description of UID/GID behaviour in image and quickstart

Following the discussion in
#15579
seems that the AIRFLOW_UID/GID parameters were not clearly
explained in the Docker Quick-start guide and some users could
find it confusing.

This PR attempts to clarify it.

* fixup! Better description of UID/GID behaviour in image and quickstart
* add quotes to extraConfigMaps

Add quotes to extraConfigMaps

* Update values.yaml

add quotes
KubernetesExecutor is also in the list of executors who needs a worker
ServiceAccount.
Bumps [ssri](https://github.com/npm/ssri) from 6.0.1 to 6.0.2.
- [Release notes](https://github.com/npm/ssri/releases)
- [Changelog](https://github.com/npm/ssri/blob/v6.0.2/CHANGELOG.md)
- [Commits](npm/ssri@v6.0.1...v6.0.2)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add support for include_headers in QuboleHook get_results
K8s clusters can use a different cluster domain than the default
`cluster.local`. This change will make the chart compatible with those
clusters as well.
Use Request's session.request factory for HTTP request initiation, this will use
environment variables and sensible defaults for requests.

Also use verify option only if it is provided to run method, as requests library
already defaults to True.

Our organization uses firewalls and custom SSL certificates to communicate
between systems, this can be achieved via `CURL_CA_BUNDLE` and
`REQUESTS_CA_BUNDLE` environment variables.  Requests library takes both into
account and uses them as default value for verify option when sending request to
remote system.

Current implementation is setting verify to True, which overwrites defaults and
as results requests can not be made due to SSL verification issues. This PR is
fixing the problem.
…5644)

Currently, we use the experimental REST API to run the Kubernetes executor integration tests.
This PR changes this to use the stable REST API for these tests
This change adds a delimiter to the delete_file method of Wasbhook. This way, users will be able to locate the files they want to delete using a delimiter.

Co-authored-by: Ephraim Anierobi <[email protected]>
#15657)

This prevents a gitsync contianer from being created in the
KubernetesExecutor worker if DAG persistence is enabled as the DAG will
already be on the volume. This also only mounts the DAGs volume once in
the worker.
If you aren't familiar with the history of the APIs, then from looking
at the titles it looks like the Experimental API might be the "next"
one.

We don't want people to think that!
According to the current stats of codecov.io assessment of the Airflow code base, the test coverage for
the kubernetes_executor.py module is about 63%. This metric definitely
needs to be improved upon.

This PR addresses the unit test coverage for the delete_pod method in the AirflowKubernetesScheduler class in the kubernetes_executor.py module. And when merged will help to improve the test coverage metric.

fixes part of #15523
KubernetesExecutor workers also need the log volume mounted.
bbovenzi and others added 8 commits May 5, 2021 01:28
Adds a false bottom to the table element so we can have overlap on the final row.

Fixes #15656
Error because `webpack` is not install because `yarn install --frozen-lockfile` is not run:

```
root@f5fc5cfc9a43:/opt/airflow# cd /opt/airflow/airflow/www/; yarn dev
yarn run v1.22.5
$ NODE_ENV=dev webpack --watch --colors --progress --debug --output-pathinfo --devtool eval-cheap-source-map -
-mode development
/bin/sh: 1: webpack: not found
error Command failed with exit code 127.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
root@f5fc5cfc9a43:/opt/airflow/airflow/www#
```

This commits adds `yarn install --frozen-lockfile` to the command which fixes it.

This was missed in https://github.com/apache/airflow/pull/13313/files
This change will enable us easily deploy airflow to kubernetes cluster
and test it using different executors.
Example usage:
   ./breeze kind-cluster --executor CeleryExecutor deploy
Currently it just shows "TimeoutError: There are still unapplied migrations after 60 seconds. " which is not that helpful.

This commit adds more info to show the difference in migrations in DB and source code running `check_migrations`:

Now:

```
TimeoutError: There are still unapplied migrations after 60 seconds. Migration Head(s) in DB: {'e165e7455d70'} | Migration Head(s) in Source Code: {'a13f7613ad25'}
```

closes #15650
This masks sensitive values in logs for Connections and Variables.

It behaves as follows:

- Connection passwords are always masked, where-ever they appear.

  This means, if a connection has a password of `a`, then _every_ `a` in
  log messages would get replaced with `***`

- "Sensitive" keys from extra_dejson are also masked. Sensitive is
  defined by the "existing" mechanism that the UI used, based upon the
  name of the key.

- "Sensitive" Variables are also masked.
We are allowign doc-only-changes when releasing providers,
therefore we might want to regenerate documentation for latest
version of the provider packages when there are doc-only changes.

The new --override-versioned flag enables that.
…nks` (#15673)

Without this change it is impossible for one of the providers to depend
upon the "dev"/current version of Airflow -- pip instead would try and
go out to PyPI to find the version (which almost certainly wont exist,
as it hasn't been released yet)
This PR fixes a case where a task would not call the on_failure_callback
when there's a case of OOM. The issue was that task pid was being set
at the wrong place and the local task job heartbeat was not checking the
correct pid of the process runner and task.

Now, instead of setting the task pid in check_and_change_state_before_execution,
it's now set correctly at the _run_raw_task method
@levyitay levyitay merged commit e426a09 into levyitay:master May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.