-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow ElasticSearch provider issue #25177
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Hmm. there is an on-going change #21942 - @millin maybe you could take a look at the issue here and implement as part of the improvements in #21942 ? And then @PatrykKlimowicz you could test if the change will work ? Might be a good cooperation and I have a little to no experience with Elasticsearch - but maybe you should test each-other's changes? IT's actually very easy to prepare a new provider. This: breeze prepare-provider-packages elasticsearch --version-suffix-for-pypi post1 Should build |
I think this mistake already fixed in my PR here. |
HA!. There you go! @PatrykKlimowicz - how about checking ot the code from #21942 and testing it it works for you :) ? |
@potiuk I'll try to test 😄 Will be back with some feedback soon |
@potiuk I followed this to setup env with breeze, but I stuck on this error: (myvenv) ➜ ~/dev/airflow git:(main) ✗ breeze --force-build prepare-provider-packages elasticsearch --version-suffix-for-pypi post1
Good version of Docker: 20.10.12.
Good version of docker-compose: 2.2.3
Good Docker context used: default.
Docker image build is not needed for CI build as no important files are changed! You can add --force-build to force it
Requirement already satisfied: pip==22.2 in /usr/local/lib/python3.7/site-packages (22.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Get all providers
Copy sources
===================================================================================
Copying sources for provider packages
===================================================================================
/opt/airflow /opt/airflow/dev/provider_packages
/opt/airflow/dev/provider_packages
-----------------------------------------------------------------------------------
Package Version of providers suffix set for PyPI version: post1
-----------------------------------------------------------------------------------
########## Generate setup files for 'elasticsearch' ##########
Traceback (most recent call last):
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 2001, in <module>
cli()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1541, in generate_setup_files
current_tag = get_current_tag(provider_package_id, version_suffix, git_update, verbose)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 1553, in get_current_tag
make_sure_remote_apache_exists_and_fetch(git_update, verbose)
File "/opt/airflow/dev/provider_packages/prepare_provider_packages.py", line 715, in make_sure_remote_apache_exists_and_fetch
stderr=subprocess.DEVNULL,
File "/usr/local/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'fetch', '--tags', '--force', 'apache-https-for-providers']' returned non-zero exit status 128.
===================================================================================
Summary of prepared packages:
Errors:
elasticsearch
==================================================================================
==================================================================================
There were errors when preparing packages. Exiting! Any ideas? |
Interesting. Do you happen to work in a worktree maybe ? |
Because that would happen if you run this command in the worktree |
And there is a flag to disable this command i think - just run it with --help |
Nope
I do not see any special flag. I tried to fix ownership, but still got the error, but to be more precise I see this in debug mode:
|
I disabled the SSL and it's "OK" now. |
Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it's not nessary to be run, it's more to make sure we have latest version of tags :) . Re: using in K8S - you really need to update your image. See https://airflow.apache.org/docs/docker-stack/entrypoint.html#installing-additional-requirements
Then whenever any of the components start it will install the package before running anyhing |
COOOL. I am merging it now then :). We release providers ~ monthly last release was last week, so expect this one in ~3 weeks or so |
#25236 to skip the fetch error and turn it into warning @PatrykKlimowicz |
Apache Airflow version
2.3.3 (latest released)
What happened
Durign usage of Airflow v2.1.3 in my project this issue appeared, and was solved by adding the
Offset_Key
to the Fluent Bit configuration. This Offset_Key appends the offset field to the logs, so we can retrieve the logs in correct order. We specified theAIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"
and logs were retrieved correctly based on thecustom_offset
and then displayed in Airflow UI.Now, I updated the version to the v2.3.3 and this behavior is no longer valid. I tested some combinations:
Due to backward compatibility I need to achieve config in which
custom_offset
has higher precedence than the one Airflow inserts.As suggested here I tried to lower the elasticsearch provider version and see which one will work for this scenario.
It turned out that the version which we used with Airflow v2.1.3 was OK, so the
apache-airflow-providers-elasticsearch==2.0.2
.I think that this change break our use case, as the version
2.0.3
is first that does not work for us - changelog. With the version 2.0.2 I can see thatcustom_offset
and the Airflow'soffset
are added to the logs, but thanks toAIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"
logs are displayed in correct order.What you think should happen instead
Offset from Airflow should not conflict with the offset added by third party tool since Airflow does not support sending logs to the ElasticSearch, but supports reading from it.
Most probably, there will be an issue with flow of the logs. Right now it is like:
Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow
so Airflow does not know about the (in that specific case) Fluent Bit config and it's offset name.
It would be nice to make the change in version 2.0.3 I linked above optional, so we can instruct Airflow if it should create a offset with given
AIRFLOW__ELASTICSEARCH__OFFSET_FIELD
name or just use that name to obtain logs (I do not know the whole logic behind the Airflow logs retrieval, so not sure if this is a good idea). I think that the bool flag likeAIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD
could determine the creation of Airflow's offset field and theAIRFLOW__ELASTICSEARCH__OFFSET_FIELD
could determine what name to use to either create and retrieve logs OR just retrieve the logs.How to reproduce
Use Airflow in v2.3.3.
Use Fluent Bit in v1.9.6 and add the Offset_Key to it's INPUT config
Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow UI.
Operating System
AKS
Versions of Apache Airflow Providers
Working case (Airflow 2.1.3):
Not working case (Airflow v2.3.3):
Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2
Deployment
Other 3rd-party Helm chart
Deployment details
We are using Airflow Community Helm chart + Azure Kubernetes Service
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: