Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet: error: could not decode the response #25405

Closed
mtojek opened this issue Apr 12, 2021 · 15 comments
Closed

Fleet: error: could not decode the response #25405

mtojek opened this issue Apr 12, 2021 · 15 comments
Assignees
Labels

Comments

@mtojek
Copy link
Contributor

mtojek commented Apr 12, 2021

The goal of the issue to root cause problems with Elastic Agent/Fleet/Kibana with latest snapshots. The stack boots up correctly, then a new policy should be reassigned to the agent, but it seems that it never happens.

Artifacts/logs: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-319/1/artifacts

�[36mfleet-server_1               |�[0m 2021-04-12T08:52:08.659Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m 
�[36mfleet-server_1               |�[0m 2021-04-12T08:53:42.756Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m 
�[36mfleet-server_1               |�[0m 2021-04-12T08:56:45.556Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m 

We deployed an emergency fix to use last known stable revisions. Here is the PR reverting this change: elastic/elastic-package#319 . This one is expected to be back to green.

It impacts elastic/integrations for ~5 days now.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/fleet (Team:Fleet)

@mtojek
Copy link
Contributor Author

mtojek commented Apr 13, 2021

@blakerouse @nchaulet Would you mind sharing the status? Do you need more reference data/logs?

For clarity: due to this issue we are unable to test integrations with latest snapshots (current using ones published a week ago).

@blakerouse
Copy link
Contributor

I see that fleet-server is having issues with the following error:

{"log.level":"info","error":"elastic fail 404:index_not_found_exception:no such index [.fleet-actions]","id":"9db653bd-0679-4db1-9419-397be8e22933","code":400,"@timestamp":"2021-04-13T20:36:27.251Z","message":"fail checkin"}

Did some more digging and this was not an error on 8.0. After some more looking I see that Fleet Server missed a backport to 7.x.

elastic/fleet-server#205

@blakerouse
Copy link
Contributor

Backport PR - elastic/fleet-server#231

Once landed and new 7.x snapshot, I expect this to be fixed.

@simitt
Copy link
Contributor

simitt commented Apr 16, 2021

Testing with latest 7.13 snapshot (2021/04/16) and still seeing following error:

{"log.level":"error","@timestamp":"2021-04-16T08:54:34.425Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":203},"message":"Could not communicate with Checking API will retry, error: could not decode the response, raw response: \n","ecs.version":"1.6.0"}

@jen-huang
Copy link

Are we still seeing this behavior with recent snapshots? If so, @blakerouse @nchaulet any guess on which side this is on?

@mtojek
Copy link
Contributor Author

mtojek commented Apr 27, 2021

I haven't seen it, but this error is so generic that when it bumps up it usually means something really different. Is it possible to improve logging in this case? For example - flush the raw byte content or something (if debug is enabled).

@ruflin
Copy link
Contributor

ruflin commented Apr 27, 2021

Also did not see this recently. ++ on improving logging. I filed #25230, please feel free to edit it and add entries with specific things that should be improved.

@simitt
Copy link
Contributor

simitt commented Apr 27, 2021

It's back - seen with the latest snapshots today.

@mtojek
Copy link
Contributor Author

mtojek commented Apr 28, 2021

I can give you some hints. I think the CI managed to reproduce it if only the Fleet-Server health status fluctuates (see: #25341).

@ruflin

This comment has been minimized.

@jen-huang jen-huang transferred this issue from elastic/kibana Apr 28, 2021
@jen-huang jen-huang added Agent bug Team:Elastic-Agent Label for the Agent team labels Apr 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@jen-huang
Copy link

I went ahead and moved this and put it in the agent Iteration board :)

@ph ph added the v7.13.0 label Apr 28, 2021
@ph
Copy link
Contributor

ph commented Apr 28, 2021

This is indeed an agent @elastic/agent Anyone could take a look at this one?

@ph ph assigned faec May 12, 2021
@faec
Copy link
Contributor

faec commented May 13, 2021

Confirmed that this is no longer happening in local snapshots or nightly builds. Closing, though we can reopen if this is spotted again independent of #25341.

@faec faec closed this as completed May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants