Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple endpoints #880

Merged
merged 8 commits into from
Nov 23, 2021
Merged

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Nov 16, 2021

What is the problem this PR solves?

What this PR solves is a problem when agent got unenrolled on heavier load when agent managing fleet server cannot checkin to it's own server so it will fallback to unenroll.
Closes #741

How does this PR solve the problem?

Problem is solved by adding internal endpoint which is used for communication on local network (with agent handling fleet server)
It lets FS to spin up 2 set of handlers, one on public 8220 and one on port defined in config.

How to test this PR locally

This needs to be tested with work on elastic-agent Link: elastic/beats#28993

  • Start stack
  • Install agent with FS in a policy
  • Check ports
sh-3.2# lsof -i -P | grep LISTEN | grep fleet
fleet-ser  7056            root   19u  IPv4 0xba7881a9227099a5      0t0    TCP localhost:{random_port} (LISTEN)
fleet-ser  7056            root   21u  IPv6 0xba7881a91284721d      0t0    TCP *:8220 (LISTEN)
  • run wireshark, set filter to random port, there should be some comm
  • set filter to 8220 port, there should be no comm
  • enroll new agent, from another VM
  • there should be some comm on both ports

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@michalpristas michalpristas added bug Something isn't working backport-v7.16.0 Automated backport with mergify labels Nov 16, 2021
@michalpristas michalpristas self-assigned this Nov 16, 2021
@elasticmachine
Copy link
Contributor

elasticmachine commented Nov 16, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-23T08:02:41.102+0000

  • Duration: 9 min 23 sec

  • Commit: c1daf50

Test stats 🧪

Test Results
Failed 0
Passed 213
Skipped 0
Total 213

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@jlind23 jlind23 added the backport-v8.0.0 Automated backport with mergify label Nov 17, 2021
cmd/fleet/server.go Outdated Show resolved Hide resolved
log.Debug().Msgf("Listening on %s", addr)

go func(ctx context.Context, errChan chan error, ln net.Listener) {
if err := server.Serve(ln); err != nil && err != http.ErrServerClosed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do direct a direct comparison of the error (err != http.ErrServerClosed), or use the errors.Is() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean replacing err != nil && err != http.ErrServerClosed with err != http.ErrServerClose?
if so you might end up with sending nil to a chan

Comment on lines +85 to +86
case <-forceCh:
log.Debug().Msg("go routine forced closed on exit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused what the purpose of forceCh and this message are, does this occur when the server closes with a non context.Cancelled error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i'm not really sure why this is here. i understand it closes goroutine with main func but not sure why this was a problem and why this was not addressed with cancel contexts.
not touching this atm

@mergify
Copy link
Contributor

mergify bot commented Nov 23, 2021

This pull request is now in conflicts. Could you fix it @michalpristas? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b multiple-endpoints upstream/multiple-endpoints
git merge upstream/master
git push upstream multiple-endpoints

@michalpristas michalpristas merged commit f49e5da into elastic:master Nov 23, 2021
michalpristas added a commit to michalpristas/fleet-server that referenced this pull request Dec 7, 2021
michalpristas added a commit that referenced this pull request Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.16.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fleet server is unexpectedly unenrolled under load
4 participants