-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push onboarding doc on startup #117
Conversation
4abf8c6
to
f66c5e5
Compare
68f38be
to
feffb7e
Compare
feffb7e
to
6d28ca2
Compare
6d28ca2
to
c70c7af
Compare
beater/onboarding.go
Outdated
|
||
var isServerUp = func() bool { | ||
secure := config.SSL.isEnabled() | ||
return isServerUp(secure, config.Host, 10, time.Second*1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the *1
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover
@@ -37,11 +36,13 @@ func (bt *beater) Run(b *beat.Beat) error { | |||
} | |||
defer pub.Stop() | |||
|
|||
go notifyListening(bt.config, pub.Send) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we trigger this after err = run(bt.server, bt.config.SSL)
to ensure the server is up instead of polling for 10 seconds.
The problem I see with the current handling is that the server has to be started within n seconds otherwise we never send a document to ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As run
is blocking, we can't start it afterwards. I think it's pretty save that the server is starting in 10s or otherwise returns an error (I hope ).
if err == http.ErrServerClosed { | ||
logp.Info("Listener stopped: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do send a document when the server is started, but we only log an information when it is stopped. Hence, having a document that the server was started in ES is not a relieable indicator for the server actually running.
I think the original idea was to push a document and regulary update this document with the information that the server is running. When the last update of the document is too long ago this would indicate the server having stoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such additional features will be covered by monitoring: elastic/beats#3422
c70c7af
to
9b0b87c
Compare
|
For the UI an onboarding doc is needed to see when the server is actually started. A doc is pushed with the field `listening: "host"` and the `@timestamp` when it was pushed. This happens on each startup. The reason the listening field was picked with the host inside, this makes it very easy for the agent creator to detect if perhaps the server is listening on the wrong host or port. The code checks that the HTTP Server is running and as soon as this is the case, sends the doc. This PR takes over the previous work from elastic#110
|
||
event := beat.Event{ | ||
Timestamp: time.Now(), | ||
Fields: common.MapStr{"listening": config.Host}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, why was this field added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jalvz See PR message. Sorry, it was not there before, only in the commit message. Forgot to update the PR message too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@formgeist is relevant to show in the on-boarding page in which port is the APM Server running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jalvz I don't think so, we'll just check that it's there and move on. Won't be printing the port to the user in the UI. Unless you feel that might be a good thing to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if not used for on-boarding is better to remove it, this is more maintenance work potentially duplicated in beats monitoring. Are you ok with removing it @ruflin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put this in as I think it will help us and the users to debug potential problems with the apm-server. So in case a user does not have access to apm-server and see it running, but his agent can't send data to it, we can ask him to check this doc about the host config that is set. I expect this to be one of the most common getting started errors to happen that the wrong host / port is set. I would keep it in as it does not add any complexity as far as I can see but brings potential value. I don't think we need it in the UI for now as the doc is always accessible through discovery.
For the UI an onboarding doc is needed to see when the server is actually started. A doc is pushed with the field
listening: "host"
and the@timestamp
when it was pushed. This happens on each startup. The reason the listening field was picked with the host inside, this makes it very easy for the agent creator to detect if perhaps the server is listening on the wrong host or port.The code checks that the HTTP Server is running and as soon as this is the case, sends the doc.
This PR takes over the previous work from #110