Fix downloading agent manifest from upstream #461

MichaelKatsoulis · 2021-08-03T11:28:37Z

This PR tries to fix bug created with #452

Closes Issue: Bug: can't download elastic-agent-managed-kubernetes.yaml #458

In this PR a retry operation is added when downloading elastic agent manifest from upstream. If the status code of the response is not 200 or the received bytes of the file are less than 2000, the code will retry for 5 times to fetch the file before failing.
This way short network issues can be faced

…m 7.x instead of using a local static file (elastic#459)" This reverts commit 6531362.

elasticmachine · 2021-08-03T12:07:54Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-08-04T10:21:56.763+0000
Duration: 45 min 56 sec
Commit: 127f587

Test stats 🧪

Test	Results
Failed	0
Passed	425
Skipped	4
Total	429

Trends 🧪

MichaelKatsoulis · 2021-08-03T12:12:14Z

/test

mtojek · 2021-08-03T13:38:14Z

internal/testrunner/runners/system/servicedeployer/kubernetes.go

+	defer resp.Body.Close()
+	logger.Debugf("status code when downloading elastic-agent-managed-kubernetes.yaml is %d", resp.StatusCode)
+	if resp.StatusCode != 200 {
+		return nil, errors.Wrapf(err, "downloading failed due to status code %d", resp.StatusCode)


Maybe print also the body here?

mtojek · 2021-08-03T13:40:57Z

internal/testrunner/runners/system/servicedeployer/kubernetes.go

+				return elasticAgentManagedYaml, nil
+			}
+			err = fmt.Errorf("bytes downloaded should be more than 2000 but where: %d", len(elasticAgentManagedYaml))
+			logger.Debugf("failed because of %s", err)


Did you check these debugf calls (for example with an invalid URL)?

mtojek · 2021-08-03T13:42:46Z

internal/testrunner/runners/system/servicedeployer/kubernetes.go

+			}
+			err = fmt.Errorf("bytes downloaded should be more than 2000 but where: %d", len(elasticAgentManagedYaml))
+			logger.Debugf("failed because of %s", err)
+			logger.Debugf("file downloaded is %s", string(elasticAgentManagedYaml))


I discourage printing the doc here. If you really want to check if the content is right, maybe print just MD5?

mtojek · 2021-08-03T13:43:11Z

internal/testrunner/runners/system/servicedeployer/kubernetes.go

+			err = fmt.Errorf("bytes downloaded should be more than 2000 but where: %d", len(elasticAgentManagedYaml))
+			logger.Debugf("failed because of %s", err)


I think you combine these two lines together

I need the err variable to be set here in order to be returned later

mtojek · 2021-08-03T13:43:59Z

scripts/test-check-packages.sh

@@ -10,7 +10,7 @@ cleanup() {

  # Dump kubectl details
  kubectl describe pods --all-namespaces > build/kubectl-dump.txt
-  kubectl logs -l app=kind-fleet-agent-clusterscope -n kube-system >> build/kubectl-dump.txt


Please grep the codebase here if there are not references to kind-fleet-agent-clusterscope .

There are only references in tests and docs of kubernetes test package

MichaelKatsoulis · 2021-08-03T14:32:26Z

I update the error handling and error messages. It is tested with scenarios where the url is invalid and the bytes are less than expected. The errors are the ones supposed to be.

mtojek · 2021-08-03T16:15:44Z

This is unrelated to this PR, but maybe worth defining in the beats repo:

[2021-08-03T14:50:34.562Z] 2021/08/03 14:50:34 DEBUG downloading elastic-agent-managed-kubernetes.yaml from https://raw.githubusercontent.com/elastic/beats/7.x/deploy/kubernetes/elastic-agent-managed-kubernetes.yaml
[2021-08-03T14:50:34.827Z] 2021/08/03 14:50:34 DEBUG status code when downloading elastic-agent-managed-kubernetes.yaml is 200
[2021-08-03T14:50:34.827Z] 2021/08/03 14:50:34 DEBUG downloaded 5084 bytes
[2021-08-03T14:50:34.827Z] 2021/08/03 14:50:34 DEBUG Apply Kubernetes stdin
[2021-08-03T14:50:34.827Z] 2021/08/03 14:50:34 DEBUG run command: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-461/bin/kubectl apply -f - -o yaml
[2021-08-03T14:50:35.399Z] 2021/08/03 14:50:35 DEBUG Handle "apply" command output
[2021-08-03T14:50:35.399Z] 2021/08/03 14:50:35 DEBUG Extract resources from command output
[2021-08-03T14:50:35.399Z] 2021/08/03 14:50:35 DEBUG Wait for ready resources
[2021-08-03T14:50:35.399Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: DaemonSet, namespace: kube-system)
[2021-08-03T14:50:35.399Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: ClusterRoleBinding, namespace: )
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: RoleBinding, namespace: kube-system)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent-kubeadm-config (kind: RoleBinding, namespace: kube-system)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: ClusterRole, namespace: )
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: Role, namespace: kube-system)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent-kubeadm-config (kind: Role, namespace: kube-system)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG Sync resource info: elastic-agent (kind: ServiceAccount, namespace: kube-system)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG beginning wait for 8 resources with timeout of 10m0s
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG install custom Kubernetes definitions (directory: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-461/src/github.com/elastic/elastic-package/test/packages/kubernetes/data_stream/apiserver/_dev/deploy/k8s)
[2021-08-03T14:50:35.659Z] 2021/08/03 14:50:35 DEBUG no custom definitions found (directory: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-461/src/github.com/elastic/elastic-package/test/packages/kubernetes/data_stream/apiserver/_dev/deploy/k8s). Nothing else will be installed.
[2021-08-03T14:50:35.660Z] 2021/08/03 14:50:35 DEBUG GET http://127.0.0.1:5601/api/fleet/agents
[2021-08-03T14:50:35.660Z] 2021/08/03 14:50:35 DEBUG filter agents using criteria: NamePrefix=kind-control-plane
[2021-08-03T14:50:35.660Z] 2021/08/03 14:50:35 DEBUG found 0 enrolled agent(s)
[2021-08-03T14:50:36.597Z] 2021/08/03 14:50:36 DEBUG GET http://127.0.0.1:5601/api/fleet/agents
[2021-08-03T14:50:36.597Z] 2021/08/03 14:50:36 DEBUG filter agents using criteria: NamePrefix=kind-control-plane
[2021-08-03T14:50:36.597Z] 2021/08/03 14:50:36 DEBUG found 0 enrolled agent(s)
[2021-08-03T14:50:37.552Z] 2021/08/03 14:50:37 DEBUG GET http://127.0.0.1:5601/api/fleet/agents
[2021-08-03T14:50:37.811Z] 2021/08/03 14:50:37 DEBUG filter agents using criteria: NamePrefix=kind-control-plane
[2021-08-03T14:50:37.811Z] 2021/08/03 14:50:37 DEBUG found 0 enrolled agent(s)
[2021-08-03T14:50:38.752Z] 2021/08/03 14:50:38 DEBUG GET http://127.0.0.1:5601/api/fleet/agents
[2021-08-03T14:50:38.752Z] 2021/08/03 14:50:38 DEBUG filter agents using criteria: NamePrefix=kind-control-plane
[2021-08-03T14:50:38.752Z] 2021/08/03 14:50:38 DEBUG found 0 enrolled agent(s)

As you can see, the code doesn't work for created resources as it does for kube-state-metrics:

[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Handle "apply" command output
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Extract resources from command output
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Wait for ready resources
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: kube-state-metrics (kind: ClusterRoleBinding, namespace: )
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: kube-state-metrics (kind: ClusterRole, namespace: )
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: hello (kind: CronJob, namespace: default)
[2021-08-03T14:56:34.402Z] W0803 14:56:34.222531  100343 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: kube-state-metrics (kind: Deployment, namespace: kube-system)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: hello (kind: Job, namespace: default)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: task-pv-volume (kind: PersistentVolume, namespace: )
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: task-pv-claim (kind: PersistentVolumeClaim, namespace: default)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: pods-high (kind: ResourceQuota, namespace: default)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: kube-state-metrics (kind: ServiceAccount, namespace: kube-system)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: kube-state-metrics (kind: Service, namespace: kube-system)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Sync resource info: web (kind: StatefulSet, namespace: default)
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG beginning wait for 11 resources with timeout of 10m0s
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:36.307Z] 2021/08/03 14:56:36 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:38.871Z] 2021/08/03 14:56:38 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:40.780Z] 2021/08/03 14:56:40 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:42.693Z] 2021/08/03 14:56:42 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:44.604Z] 2021/08/03 14:56:44 DEBUG GET http://127.0.0.1:5601/api/fleet/agents
[2021-08-03T14:56:44.604Z] 2021/08/03 14:56:44 DEBUG filter agents using criteria: NamePrefix=kind-control-plane
[2021-08-03T14:56:44.604Z] 2021/08/03 14:56:44 DEBUG found 1 enrolled agent(s)
[2021-08-03T14:56:44.604Z] 2021/08/03 14:56:44 DEBUG creating test policy...
[2021-08-03T14:56:44.604Z] 2021/08/03 14:56:44 DEBUG POST http://127.0.0.1:5601/api/fleet/agent_policies

I was wondering if this is something we can improve in Beats.

MichaelKatsoulis · 2021-08-04T06:55:49Z

[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG beginning wait for 11 resources with timeout of 10m0s
[2021-08-03T14:56:34.402Z] 2021/08/03 14:56:34 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:36.307Z] 2021/08/03 14:56:36 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:38.871Z] 2021/08/03 14:56:38 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:40.780Z] 2021/08/03 14:56:40 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready
[2021-08-03T14:56:42.693Z] 2021/08/03 14:56:42 DEBUG Deployment is not ready: kube-system/kube-state-metrics. 0 out of 1 expected pods are ready

You mean that the part of checking the readiness of the pods is missing?

mtojek · 2021-08-04T07:15:51Z

Yes. As you see here, we're using helm internals (same logic) to wait for resources. I'm wondering what's missing that it doesn't wait for it. Is it just lack of healthcheck or deployment?

Please compare it with: https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/pipelines/integrations/branches/master/runs/647/nodes/304/steps/5473/log/?start=0

mtojek

To be honest it sounds like a blocker for this, as with remote YAML elastic-package doesn't wait for agent deployment.

mtojek

Please remove also this file. I understand that it won't be used anymore.

….com:MichaelKatsoulis/elastic-package into fix_downloading_agent_manifest_from_upstream

mtojek

LGTM. Please merge it if CI is happy.

MichaelKatsoulis and others added 7 commits August 3, 2021 12:10

Revert "Revert: Get elastic-agent-managed-daemonset.yaml from upstrea…

5601bce

…m 7.x instead of using a local static file (elastic#459)" This reverts commit 6531362.

Revert "Revert: Get elastic-agent-managed-daemonset.yaml from upstrea…

3687155

…m 7.x instead of using a local static file (elastic#459)" This reverts commit 6531362.

Merge remote-tracking branch 'marcin/revert-rev' into revert-rev

a18a38c

Merge remote-tracking branch 'upstream/master' into revert-rev

d0e15d9

Retry when downloading fails

ab9128f

Remove unused debug

5f6cccf

Add debug

5bfd3d4

MichaelKatsoulis marked this pull request as draft August 3, 2021 11:28

MichaelKatsoulis added 3 commits August 3, 2021 16:04

Simplify debug messages

6b59a7e

Add comment

2866387

Fix error

d533e8d

MichaelKatsoulis marked this pull request as ready for review August 3, 2021 13:26

MichaelKatsoulis added 2 commits August 3, 2021 16:35

Refactor debug messages

7adee35

Refactor debug messages

ac82a52

mtojek reviewed Aug 3, 2021

View reviewed changes

Updated error handling

9a26d5c

mtojek approved these changes Aug 3, 2021

View reviewed changes

Merge branch 'master' into fix_downloading_agent_manifest_from_upstream

a34d601

Merge branch 'master' into fix_downloading_agent_manifest_from_upstream

6fd9b2c

mtojek self-requested a review August 4, 2021 07:17

mtojek suggested changes Aug 4, 2021

View reviewed changes

mtojek self-requested a review August 4, 2021 07:20

mtojek suggested changes Aug 4, 2021

View reviewed changes

Merge branch 'master' into fix_downloading_agent_manifest_from_upstream

c4755c4

MichaelKatsoulis added 3 commits August 4, 2021 13:07

Add comment about wait of daemonset not working

f4c8b46

Merge branch 'fix_downloading_agent_manifest_from_upstream' of github…

57fc2a4

….com:MichaelKatsoulis/elastic-package into fix_downloading_agent_manifest_from_upstream

Delete old static agent file

127f587

mtojek approved these changes Aug 4, 2021

View reviewed changes

MichaelKatsoulis merged commit 6e53ef0 into elastic:master Aug 4, 2021

This was referenced Aug 4, 2021

Add support for multi node kubernetes cluster #465

Open

Bug: can't download elastic-agent-managed-kubernetes.yaml #458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix downloading agent manifest from upstream #461

Fix downloading agent manifest from upstream #461

MichaelKatsoulis commented Aug 3, 2021 •

edited

Loading

elasticmachine commented Aug 3, 2021 •

edited

Loading

Build stats

Test stats 🧪

Trends 🧪

MichaelKatsoulis commented Aug 3, 2021

mtojek Aug 3, 2021

mtojek Aug 3, 2021

mtojek Aug 3, 2021

mtojek Aug 3, 2021

MichaelKatsoulis Aug 3, 2021

mtojek Aug 3, 2021

MichaelKatsoulis Aug 3, 2021

MichaelKatsoulis commented Aug 3, 2021

mtojek commented Aug 3, 2021 •

edited

Loading

MichaelKatsoulis commented Aug 4, 2021

mtojek commented Aug 4, 2021 •

edited

Loading

mtojek left a comment

mtojek left a comment

mtojek left a comment

		err = fmt.Errorf("bytes downloaded should be more than 2000 but where: %d", len(elasticAgentManagedYaml))
		logger.Debugf("failed because of %s", err)

Fix downloading agent manifest from upstream #461

Fix downloading agent manifest from upstream #461

Conversation

MichaelKatsoulis commented Aug 3, 2021 • edited Loading

elasticmachine commented Aug 3, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

MichaelKatsoulis commented Aug 3, 2021

mtojek Aug 3, 2021

Choose a reason for hiding this comment

mtojek Aug 3, 2021

Choose a reason for hiding this comment

mtojek Aug 3, 2021

Choose a reason for hiding this comment

mtojek Aug 3, 2021

Choose a reason for hiding this comment

MichaelKatsoulis Aug 3, 2021

Choose a reason for hiding this comment

mtojek Aug 3, 2021

Choose a reason for hiding this comment

MichaelKatsoulis Aug 3, 2021

Choose a reason for hiding this comment

MichaelKatsoulis commented Aug 3, 2021

mtojek commented Aug 3, 2021 • edited Loading

MichaelKatsoulis commented Aug 4, 2021

mtojek commented Aug 4, 2021 • edited Loading

mtojek left a comment

Choose a reason for hiding this comment

mtojek left a comment

Choose a reason for hiding this comment

mtojek left a comment

Choose a reason for hiding this comment

MichaelKatsoulis commented Aug 3, 2021 •

edited

Loading

elasticmachine commented Aug 3, 2021 •

edited

Loading

mtojek commented Aug 3, 2021 •

edited

Loading

mtojek commented Aug 4, 2021 •

edited

Loading