-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate the 2 non-official elasticsearch libraries #31920
Deprecate the 2 non-official elasticsearch libraries #31920
Conversation
Remove deprecated libraries in package import
Looks fantastic. Will open up for Elasticsearch 8 migration. What we will also need is a note in the elasticsearch Changelog.rst and bumping the version there (and in provider.yaml) to 5.* - this is not a breaking change per-se, but it removes the dsl dependencies, that some users might rely on, so IMHO at the very least we need to raise MAJOR version of the elasticsearch provider and make people aware that if they want to to continue using the libraries they have to install them on their own. I think this also means (see the change in the dockerfile test) that we need to make a similar note in the |
cc: @eladkal |
If we can split this PR int 2: core part and provider part that would be better as providers and core don't have same release cycle for the provider part please add 5.0.0 entry in provider yaml
and add the following block to change log
https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/CHANGELOG.rst |
Oh yeah. I am curious how it will work out to split it. Might be a nice experiment to see if there are any problems caused. |
Added |
"elasticsearch": ["elasticsearch", "es.elastic", "elasticsearch_dsl"], | ||
"elasticsearch": ["elasticsearch"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we split the PR I guess this is the only change that needs to be extracted to a followup PR? (not sure if doable though? in terms of getting green CI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we won't be able to merge this one without this change. (tests will fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are the kinds of problems I mentioned when "experimenting". It would have have to be extracted and merged before, effectively making dependencies between PRs. Easier for release manager, more difficult for contributor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah so we need also to set milestone on this one for Airflow 2.7.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess.. No need because we aren't going to cherry-pick that one I guess. It would have been more complex if we actually decided to cherry-pick because we woud have to remember to cherry-pick both together (that's what you loose when you split such related PRs that they become non-cherry-pickable as single PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal -> do you still want to split it ? And @Owen-CH-Leung - do you understand what we are asking for here?
I have personally a feeling, the need for splitting such PR will be difficult to explain to the contributors, because they have no context and might be a bit difficult to understand what we are asking for and why and how to do it.
I am - honestly - not sure if this is a better solution, taking into account that the only side-effect of having this ina a single PR is that the same change will appear in both provider and the core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potiuk pretty hard for me to understand as I don't have experience with releasing airflow & airflow provider package. My guess is : it would be better to have one PR for code change (including tests), and another PR for deprecating the dependencies ? So that means with the code-change PR being merged, we still have non-official elasticsearch library as dependency, just that we are not using them ?
More specifically , the changes to CHANGELOG.rst
, test_prod_image.py
, provider_dependencies.json
, index,rst
and etc...All go to another PR ?
I hope I'm getting it correctly but if not please explain to me. I can accommodate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal -> do you still want to split it
I'm OK with keeping it as is.
668daab
to
5a4106d
Compare
Can you rebase @Owen-CH-Leung ? We had some issues with building images that need rebase |
@potiuk done |
@potiuk looks like multiple CI jobs fail at the |
@potiuk The CI job fatal: bad revision 'apache-https-for-providers/main...providers-elasticsearch/4.5.1'
Traceback (most recent call last): Is it's due to the fact that |
Yeah. @eladkal works on it as we speak. We will re-run it after. |
Woooho! |
--------- Co-authored-by: eladkal <[email protected]>
hi, I am trying to use Elasticsearch remote logging with Airflow Breeze locally. The following configuration used to work for me previously (apache-airflow-providers-elasticsearch==4.5.1) to fetch and render remote logs in the webserver UI task logs using the below setting in
However, on main (I think about to be released provider with this change), I am getting the below error when the webserver tries to fetch the logs from ES,
Do I need to change my I am also wondering why it is making a POST http://host.docker.internal:9200/_all/_count and giving 400 as response. Should it not be a GET request? |
@pankajkoti Thanks for letting me know about your issue. I had a look at the log and it seems that the http post request is being sent to the
this endpoint is used to count the number of documents matching the given query, and the _count endpoint does not support sorting defined in the query below: but - how come your webserver will trigger a post request ? And why a post request will trigger the es_read function (if it's posting something I guess it should trigger something like Would you be able to share your full |
The steps to reproduce this would be following:
Run the task and go to the task logs to fetch the logs from this remote ES host. I am using Filebeat to ship logs to ES. I am attaching my airflow.cfg for your reference. I had to append the .txt prefix to the file for it to be able to attached here. |
Alternatively, could you please share a config that works for you for using ES remote logging? |
Hi @pankajkoti , Thanks for your info. Yes I can reproduce the error and my initial finding suggests to me that it's a bug at the existing code (rather than configuration issue). The error comes from the fact that the program tries to make a http request to (The GET / POST request doesn't really matter as this endpoint accepts both GET and POST request afaik, and POST is used because it can handle larger request body) Hence you are seeing the exception I'm still trouble shooting to find out the exact root cause of the error. will keep you posted on this |
Thank you for verifying @Owen-CH-Leung . It used to work fine with Elasticsearch provider 4.5.1 release https://pypi.org/project/apache-airflow-providers-elasticsearch/ and is failing after this change I believe. |
As part of the solution for #31256, it's necessary to first deprecate the 2 non-official elasticsearch libraries since both of them didn't support elasticsearch>=8. See their github repository for details:
https://github.com/preset-io/elasticsearch-dbapi/blob/master/requirements.txt#L9
https://github.com/elastic/elasticsearch-dsl-py/blob/main/setup.py#L32
Hence, this PR refactors the existing elasticsearch provider codebase to replace these 2 libraries with the official elasticsearch library. With this PR merged, we can then attempt to upgrade elasticsearch >= 8