Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Locally, most of our Airflow connections are set via environment variables in
catalog/.env
. In production they are generally set through the Airflow Admin UI. This caused the following issue to be missed in local testing:We have a script in
entrypoint.sh
which reformats all airflow connection environment variables (containing the stringhttp
) by urlencoding them and prepending 'http://'. The purpose of this was for convenience with setting up the Slack connections locally (otherwise you have to urlencode part of the Slack uri).It is not necessary for any of our other connection strings and in fact was causing issues with the Elasticsearch connections -- the reformatted string was not parsed correctly, parsing the entire uri as the
host
.Description
This PR:
entrypoint
script to only modify Slack connection strings, so that they will continue to workget_es_host
logic for connecting to Elasticsearch to correctly return the entire URI, rather than just the host (which was previously a workaround for the buggy local behavior, that is not compatible with the connections set in the Admin UI). This should now work for connections set in.env
AND in the Airflow UI.Testing Instructions
Check out this branch and first verify that the connections as currently set in your local environment still work:
data_refresh
andelasticsearch_http_production
connectionsstaging_elasticsearch_cluster_healthcheck
to test theelasticsearch_http_staging
connectionNow re-test, setting the connection variables through the Admin UI. You can delete them from your
.env
and rebuild, or more simply just edit the code inelasticsearch_cluster/shared.py
andcommon/ingestion_server.py
to use new conn_ids (elasticsearch_test
andingestion_server_test
. Then in the airflow UI, navigate to Admin > Connections and add two new connections:ingestion_server_test
(ordata_refresh
if you deleted the local environment variable)HTTP
ingestion_server
http
8001
elasticsearch_test
(orelasticsearch_http_staging
if you deleted the local environment variable)HTTP
es
9200
elasticsearch_http_production
as wellThen re-run the data refresh and the
staging_elasticsearch_cluster_healthcheck
and ensure they still work.Note that when we merge this we need to drop the 'schema' configuration for the two elasticsearch http connections in productiotn.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin