-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Failing ES Remote Logging #32438
Conversation
@pankajkoti Could you use this branch to test on your end and see if the remote logging works ? I tested locally (using the |
hi @Owen-CH-Leung Thanks for trying to provide quick resolution here. I can see that the error for sort has gone away. But, unfortunately it is not able to fetch the corresponding logs for the task for me(it used to work fine previously). I can see that logs are getting shipped successfully to ES but the search query somehow is not able to match and find those. I am trying to see meantime if I can somehow fix it by comparing the previous code https://github.com/apache/airflow/pull/31920/files#diff-dd898ab2ed4bca853f1ce5cf52b6fbb37d5fc3545f28967c03dc499fabd3a746 and the current query, but wanted to inform you about the current progress. |
@pankajkoti Ok I tried to use filebeat to ship logs to ES and also encountered the same issue. Logs are not being fetched. Looking at it now |
@pankajkoti Hey - could you also share how you set up filebeat to ship logs to ES ? I was using 4.5.1 and still I don't have luck fetching the logs. FYI I was using the tag let me know. Thanks! |
hi @Owen-CH-Leung . Please find below steps I used to set up Filebeat to ship logs. @jedcunningham helped me to set that up, so thanks a ton to him. I am using Docker Desktop's Kubernetes cluster on MacOS, so you might unfortunately need Docker Desktop's Kubernetes cluster for the below steps to work. I have attached 4 files to be used, please remove the
There might be a slight delay in shipping the logs but you will see them eventually in a minute using Kibana UI's Logs feature. es.yaml.txt Airflow Config changes for this stack:
|
@pankajkoti Can you check on your side like in Kibana dashboard, does the documents have the field I set up ES, filebeat and kibana accordingly and inside kibana dashbaord, I can see the logs are streamed to ES, including logs of the new runs that I triggered in airflow web UI. My I think if the field |
@pankajkoti No worries it's just my mistake - I can see |
@pankajkoti -> would love to get confirmation that this one fixes the issue :) |
@potiuk Sorry Sir. Unfortunately, it does not solve the issue for me yet |
I might try to see if I can get it working by trying to make few changes here but have an occupied day unfortunately today, so cannot commit will be able to solve this today |
@potiuk @pankajkoti The current code change still didn't solve the issue I think. I'm making few changes locally to test out if it fixes the stuff. |
@pankajkoti I finally have it working locally after making some more code change. I'll do some clean up and push the code to this PR in coming few days. will let you know once it's ready for you to test. |
Okay thank you @Owen-CH-Leung |
@pankajkoti Can you test again ? It should work now. |
@Owen-CH-Leung Thanks I tested now and it seems to work fine 🎉 |
cc: @dstandish @sunank200 |
@Owen-CH-Leung Great job! I think it would be good to have some test coverage for the changes so we can avoid regressions |
@eladkal Sure. Let me add some tests for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add type annotations in method signatures and also some docstrings for the methods to understand what the methods do :)
@pankajkoti Sure thing. Will do =D |
@eladkal @pankajkoti @potiuk Added test, docstrings & method signature. However the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previous PR (link here) removes all non-official elasticsearch library and use only the official elasticsearch library. After the PR was merged, reading remote logs from ES has failed due to a bug introduced in the PR.
In particular, when reading remote logs, the webserver produced the following elasticsearch exception when sending a POST request to ES endpoint
/_all/count
:elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'request does not support [sort]')
which basically says the endpoint didn't accept the
sort
parameter from the queryHence this PR aims to fix the bug and re-enable reading remote logs. With this PR merged, airflow should resume its ability to read remote logs from ES. Local test result:
Additional implementation details:
The changes I made was just to remove the sort parameter in the query to be posted to elasticsearch.
I inspected the original source code in provider 4.5.1 :
https://github.com/apache/airflow/blob/providers-elasticsearch/4.5.1/airflow/providers/elasticsearch/log/es_task_handler.py#L290-L294
While the
sort()
method was triggered fromelasticsearch_dsl
, the actual query posted to elasticsearch NEVER includes the sort parameter. So even in4.5.1
, the result returned is not sorted. Thesort()
method in4.5.1
just provides a false sense that the returned result is sorted, but it is actually not.Here's how the function code gets called:
count()
method fromelasticsearch_dsl, class Search
gets called.https://github.com/apache/airflow/blob/providers-elasticsearch/4.5.1/airflow/providers/elasticsearch/log/es_task_handler.py#L297
count()
method, theto_dict()
method gets called withcount = True
https://github.com/elastic/elasticsearch-dsl-py/blob/main/elasticsearch_dsl/search.py#L694
to_dict()
method, sincecount = True
, the sort parameter didn't get appendedhttps://github.com/elastic/elasticsearch-dsl-py/blob/main/elasticsearch_dsl/search.py#L640-L664