Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface config parsing error under EA managed mode #14574

Merged
merged 15 commits into from
Nov 22, 2024

Conversation

carsonip
Copy link
Member

@carsonip carsonip commented Nov 7, 2024

Motivation/summary

By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

Checklist

- [ ] Update CHANGELOG.asciidoc I expect this to go into 8.17 first, then 8.16, but there may be uncertainty around this. We can add changelog before the release.

  • Documentation has been updated

How to test these changes

Start apm-server in EA managed mode (e.g. on ESS), pass in an invalid integration policy, e.g. setting apm-server.rum.response_headers to a string instead of a list. Check that in fleet UI that EA input unit has failed. Check that an error is logged.

Related issues

Fixes #14560

@carsonip carsonip changed the title Return error how BeatV2Manager prefers Surface config parsing error under EA managed mode Nov 7, 2024
Copy link
Contributor

mergify bot commented Nov 7, 2024

This pull request does not have a backport label. Could you fix it @carsonip? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.

Copy link
Contributor

mergify bot commented Nov 7, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 7, 2024
@carsonip carsonip added the backport-8.16 Automated backport with mergify label Nov 11, 2024
@carsonip carsonip added the backport-8.17 Automated backport with mergify label Nov 21, 2024
@carsonip carsonip marked this pull request as ready for review November 21, 2024 19:46
@carsonip carsonip requested a review from a team as a code owner November 21, 2024 19:46
1pkg
1pkg previously approved these changes Nov 21, 2024
@carsonip carsonip requested a review from 1pkg November 22, 2024 17:35
@carsonip carsonip merged commit 1b21d1d into elastic:main Nov 22, 2024
11 checks passed
mergify bot pushed a commit that referenced this pull request Nov 22, 2024
By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)
mergify bot pushed a commit that referenced this pull request Nov 22, 2024
By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)

# Conflicts:
#	go.mod
#	go.sum
mergify bot added a commit that referenced this pull request Nov 22, 2024
By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)

Co-authored-by: Carson Ip <[email protected]>
mergify bot added a commit that referenced this pull request Nov 25, 2024
…14574) (#14718)

* Surface config parsing error under EA managed mode (#14574)

By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)

# Conflicts:
#	go.mod
#	go.sum

* Resolve conflict

---------

Co-authored-by: Carson Ip <[email protected]>
Co-authored-by: Carson Ip <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@1pkg 1pkg mentioned this pull request Nov 25, 2024
7 tasks
@carsonip
Copy link
Member Author

carsonip commented Dec 3, 2024

@mergify backport 8.17

Copy link
Contributor

mergify bot commented Dec 3, 2024

backport 8.17

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 3, 2024
By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)
mergify bot added a commit that referenced this pull request Dec 3, 2024
By wrapping reloader errors in a way that EA manager understands (i.e. in a MultiError of UnitError), newRunner initialization errors, including config parsing errors, will be surfaced and logged.

(cherry picked from commit 1b21d1d)

Co-authored-by: Carson Ip <[email protected]>
v1v added a commit to v1v/apm-server that referenced this pull request Dec 3, 2024
…-with-main

* upstream/main: (100 commits)
  feat: bump beats to current main (elastic#14781)
  build(deps): bump github.com/elastic/elastic-agent-client/v7 from 7.16.0 to 7.17.0 (elastic#14757)
  PGO: Update default.pgo from benchmarks https://github.com/elastic/apm-server/actions/runs/12024199004/attempts/1. (elastic#14734)
  build(deps): bump slackapi/slack-github-action from 1.27.0 to 2.0.0 in the github-actions group (elastic#14656)
  build(deps): bump the otel group with 3 updates (elastic#14746)
  build(deps): bump the otel group across 1 directory with 4 updates (elastic#14681)
  testing: remove missing Make target dep (elastic#14523)
  build(deps): bump the dependencies group in /systemtest with 3 updates (elastic#14731)
  build(deps): bump chainguard/static in /packaging/docker (elastic#14732)
  build(deps): bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (elastic#14733)
  chore: Update to elastic/beats@4278366ab032 (elastic#14725)
  changelog: add 8.15.5 entry (elastic#14720)
  chore: Update to elastic/beats@06ba17caf9ed (elastic#14711)
  Surface config parsing error under EA managed mode (elastic#14574)
  release-automation: update changelog template to add bug fixes section (elastic#14693)
  PGO: Update default.pgo from benchmarks https://github.com/elastic/apm-server/actions/runs/11946993444/attempts/1. (elastic#14699)
  changelog: add changelog entry for 8.16.1 (elastic#14690)
  chore: Update k8s stack yaml files (elastic#14686)
  docs: update release doc around documentation PR (elastic#14694)
  terraform: Use non-deprecated deployment template (elastic#14682)
  ...
v1v added a commit to v1v/apm-server that referenced this pull request Dec 3, 2024
* feature/support-8.x-with-main: (196 commits)
  support github label backport
  chore
  it should be main
  feat: bump beats to current main (elastic#14781)
  build(deps): bump github.com/elastic/elastic-agent-client/v7 from 7.16.0 to 7.17.0 (elastic#14757)
  PGO: Update default.pgo from benchmarks https://github.com/elastic/apm-server/actions/runs/12024199004/attempts/1. (elastic#14734)
  build(deps): bump slackapi/slack-github-action from 1.27.0 to 2.0.0 in the github-actions group (elastic#14656)
  build(deps): bump the otel group with 3 updates (elastic#14746)
  build(deps): bump the otel group across 1 directory with 4 updates (elastic#14681)
  testing: remove missing Make target dep (elastic#14523)
  build(deps): bump the dependencies group in /systemtest with 3 updates (elastic#14731)
  build(deps): bump chainguard/static in /packaging/docker (elastic#14732)
  build(deps): bump github.com/stretchr/testify from 1.9.0 to 1.10.0 (elastic#14733)
  chore: Update to elastic/beats@4278366ab032 (elastic#14725)
  changelog: add 8.15.5 entry (elastic#14720)
  chore: Update to elastic/beats@06ba17caf9ed (elastic#14711)
  Surface config parsing error under EA managed mode (elastic#14574)
  release-automation: update changelog template to add bug fixes section (elastic#14693)
  PGO: Update default.pgo from benchmarks https://github.com/elastic/apm-server/actions/runs/11946993444/attempts/1. (elastic#14699)
  changelog: add changelog entry for 8.16.1 (elastic#14690)
  ...
@rubvs
Copy link
Contributor

rubvs commented Dec 12, 2024

Hi @carsonip, I only get the error message shown below when testing for these changes. The main concern I have is the "source": "elastic-agent" field. Should the source not be apm-server?

{
  "log.level": "error",
  "@timestamp": "2024-12-11T22:56:28.639Z",
  "log.origin": {
    "function": "github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents",
    "file.name": "coordinator/coordinator.go",
    "file.line": 663
  },
  "message": "Unit state changed apm-default-1e9d923a-9507-414e-bc8e-1737cc7ca6fc (CONFIGURING->FAILED): failed to load input config: Error processing configuration: required object, but found string in field apm-server.rum.response_headers",
  "log": {
    "source": "elastic-agent"
  },
  "component": {
    "id": "apm-default",
    "state": "HEALTHY"
  },
  "unit": {
    "id": "apm-default-1e9d923a-9507-414e-bc8e-1737cc7ca6fc",
    "type": "input",
    "state": "FAILED",
    "old_state": "CONFIGURING"
  },
  "ecs.version": "1.6.0"
}

Steps Used

  • Create a deployment in cloud.elastic.co
  • Add a new Policy via: Management -> Integrations -> Elastic APM
  • Open a Linux VM in Multipass to create a new Elastic Agent.
  • Within this VM Terminal 1:
    • curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.16.1-linux-arm64.tar.gz
    • tar xzvf elastic-agent-8.16.1-linux-arm64.tar.gz
    • cd elastic-agent-8.16.1-linux-arm64
    • sudo ./elastic-agent run
  • VM Terminal 2:
    • Copy config from ESS and replace content in VM's elastic-agent.yml
    • Change the rum field to a string as suggested.
    • Observe the logs in sudo nvim data/elastic-agent-b6da7f/logs/elastic-agent-20241211-1.ndjson

Thanks @1pkg for the help on this

@carsonip
Copy link
Member Author

elastic-agent-8.16.1-linux-arm64.tar.gz

8.16.1 is not a version that contains this fix, but I've confirmed with @rubvs offline that the apm-server binary is swapped to 8.17 for testing.

Should the source not be apm-server?

The log you see is expected. The source is EA because apm-server returns an error to EA (over grpc), and EA is the one responsible for reporting a failure. This log line confirms the bug fix is working as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Config parsing error causes EA-managed apm-server to fail silently
3 participants