Fleet: error: could not decode the response #25405

mtojek · 2021-04-12T12:51:57Z

The goal of the issue to root cause problems with Elastic Agent/Fleet/Kibana with latest snapshots. The stack boots up correctly, then a new policy should be reassigned to the agent, but it seems that it never happens.

Artifacts/logs: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-319/1/artifacts

�[36mfleet-server_1               |�[0m 2021-04-12T08:52:08.659Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m 
�[36mfleet-server_1               |�[0m 2021-04-12T08:53:42.756Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m 
�[36mfleet-server_1               |�[0m 2021-04-12T08:56:45.556Z	ERROR	fleet/fleet_gateway.go:203	Could not communicate with Checking API will retry, error: could not decode the response, raw response: 
�[36mfleet-server_1               |�[0m

We deployed an emergency fix to use last known stable revisions. Here is the PR reverting this change: elastic/elastic-package#319 . This one is expected to be back to green.

It impacts elastic/integrations for ~5 days now.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-04-12T12:51:59Z

Pinging @elastic/fleet (Team:Fleet)

mtojek · 2021-04-13T07:07:10Z

@blakerouse @nchaulet Would you mind sharing the status? Do you need more reference data/logs?

For clarity: due to this issue we are unable to test integrations with latest snapshots (current using ones published a week ago).

blakerouse · 2021-04-13T20:38:42Z

I see that fleet-server is having issues with the following error:

{"log.level":"info","error":"elastic fail 404:index_not_found_exception:no such index [.fleet-actions]","id":"9db653bd-0679-4db1-9419-397be8e22933","code":400,"@timestamp":"2021-04-13T20:36:27.251Z","message":"fail checkin"}

Did some more digging and this was not an error on 8.0. After some more looking I see that Fleet Server missed a backport to 7.x.

elastic/fleet-server#205

blakerouse · 2021-04-13T20:43:02Z

Backport PR - elastic/fleet-server#231

Once landed and new 7.x snapshot, I expect this to be fixed.

simitt · 2021-04-16T09:01:22Z

Testing with latest 7.13 snapshot (2021/04/16) and still seeing following error:

{"log.level":"error","@timestamp":"2021-04-16T08:54:34.425Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":203},"message":"Could not communicate with Checking API will retry, error: could not decode the response, raw response: \n","ecs.version":"1.6.0"}

jen-huang · 2021-04-26T21:35:28Z

Are we still seeing this behavior with recent snapshots? If so, @blakerouse @nchaulet any guess on which side this is on?

mtojek · 2021-04-27T07:23:51Z

I haven't seen it, but this error is so generic that when it bumps up it usually means something really different. Is it possible to improve logging in this case? For example - flush the raw byte content or something (if debug is enabled).

ruflin · 2021-04-27T11:26:14Z

Also did not see this recently. ++ on improving logging. I filed #25230, please feel free to edit it and add entries with specific things that should be improved.

simitt · 2021-04-27T13:36:20Z

It's back - seen with the latest snapshots today.

mtojek · 2021-04-28T09:24:24Z

I can give you some hints. I think the CI managed to reproduce it if only the Fleet-Server health status fluctuates (see: #25341).

elasticmachine · 2021-04-28T18:13:00Z

Pinging @elastic/agent (Team:Agent)

jen-huang · 2021-04-28T18:13:23Z

I went ahead and moved this and put it in the agent Iteration board :)

ph · 2021-04-28T18:18:03Z

This is indeed an agent @elastic/agent Anyone could take a look at this one?

faec · 2021-05-13T14:55:03Z

Confirmed that this is no longer happening in local snapshots or nightly builds. Closing, though we can reopen if this is spotted again independent of #25341.

mtojek assigned nchaulet Apr 12, 2021

This comment has been minimized.

Sign in to view

jen-huang unassigned nchaulet Apr 28, 2021

jen-huang transferred this issue from elastic/kibana Apr 28, 2021

jen-huang added Agent bug Team:Elastic-Agent Label for the Agent team labels Apr 28, 2021

ph added the v7.13.0 label Apr 28, 2021

ph assigned faec May 12, 2021

faec closed this as completed May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fleet: error: could not decode the response #25405

Fleet: error: could not decode the response #25405

mtojek commented Apr 12, 2021

elasticmachine commented Apr 12, 2021

mtojek commented Apr 13, 2021

blakerouse commented Apr 13, 2021

blakerouse commented Apr 13, 2021

simitt commented Apr 16, 2021

jen-huang commented Apr 26, 2021

mtojek commented Apr 27, 2021

ruflin commented Apr 27, 2021

simitt commented Apr 27, 2021

mtojek commented Apr 28, 2021

This comment has been minimized.

elasticmachine commented Apr 28, 2021

jen-huang commented Apr 28, 2021

ph commented Apr 28, 2021

faec commented May 13, 2021

Fleet: error: could not decode the response #25405

Fleet: error: could not decode the response #25405

Comments

mtojek commented Apr 12, 2021

elasticmachine commented Apr 12, 2021

mtojek commented Apr 13, 2021

blakerouse commented Apr 13, 2021

blakerouse commented Apr 13, 2021

simitt commented Apr 16, 2021

jen-huang commented Apr 26, 2021

mtojek commented Apr 27, 2021

ruflin commented Apr 27, 2021

simitt commented Apr 27, 2021

mtojek commented Apr 28, 2021

This comment has been minimized.

elasticmachine commented Apr 28, 2021

jen-huang commented Apr 28, 2021

ph commented Apr 28, 2021

faec commented May 13, 2021