Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for customized monitoring API #22605

Merged
merged 6 commits into from
Dec 21, 2020
Merged

Conversation

newly12
Copy link
Contributor

@newly12 newly12 commented Nov 17, 2020

What does this PR do?

allow customized monitoring api to be added

Why is it important?

allow customized monitoring api to be added

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

N/A

How to test this PR locally

N/A

Use cases

Leverage beats' http server, same port to be used, to expose any other info that user likes to do in a beat implementation or customization.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 17, 2020
@newly12 newly12 changed the title allow customized monitoring api to be added Add support for customized monitoring API Nov 17, 2020
@newly12
Copy link
Contributor Author

newly12 commented Nov 17, 2020

test failure does not seem to be related to this PR..

@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 17, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 17, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: urso commented: jenkins run the tests please

  • Start Time: 2020-12-19T19:11:28.449+0000

  • Duration: 55 min 27 sec

Test stats 🧪

Test Results
Failed 0
Passed 17504
Skipped 1383
Total 18887

Steps errors 2

Expand to view the steps failures

Terraform Apply on x-pack/metricbeat/module/aws
  • Took 0 min 15 sec . View more details on here
  • Description: terraform apply -auto-approve
Terraform Apply on x-pack/metricbeat/module/aws
  • Took 0 min 16 sec . View more details on here
  • Description: terraform apply -auto-approve

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 17504
Skipped 1383
Total 18887

@sayden
Copy link
Contributor

sayden commented Nov 18, 2020

Please, can you explain why is this necessary for Beats? In different words so maybe I'm more helpful: Why Beats needs this feature? Why Beats will need a customized monitoring API, who can benefit from it and how that user will gain benefit from it?

Also, how a user is expected to use it?

@newly12
Copy link
Contributor Author

newly12 commented Nov 18, 2020

Hi @sayden ,

Sorry for filing the PR directly, maybe I should start a thread to discuss further on this first, let me know.

I currently have one use case that I am trying to add some additional info in filebeat such as how many events it sends out having specific labels that we care about, though libbeat/monitoring package provides API to register any metrics given a json path, it seems not quite suit for the labels case, IMHO in the ingestion phase, in order to have metricbeat to convert sub path(s) to label(s), sounds like a bit more tricky and the paths probably need to be explicitly given in processor config if I am not wrong, and they could change from server to server. That being said, what I want to achieve is exposing these additional metrics in prometheus format so metricbeat processes them naturally.

This might be very specific edge case that rare user might face, but this change allows user to do this without forking and maintaining own version of beats and saves potential effort of solving conflicts if any.

This change in beats appears to be the easiest approach that I can think of, after this is added, in our implementation I could initialize prometheus registry and hook its handler in filebeat's http server before execute BeatsRootCmd, and afterwards in any other beat component implementation(i.e. input, processor, output) have needed metrics registered.

Please let me know if any comment or suggestion.

@vjsamuel
Copy link
Contributor

@sayden, we are trying to come up with a processor that can:

  • read log lines and come up with some basic parsing/manipulations and do counting
  • expose the counted data as a prometheus endpoint

This would allow us to use metricbeat to send data to the backend as metrics which has better query performance.

This is a pattern that is widely adopted by other agents like grafana cloud agent, otel collector. For us to be able to accomplish such a use case, we would need to bind a metrics endpoint into the http server. How would you recommend us moving forward on this? Do you think that this PR would make sense in that aspect or do you think that the processor should start up the http server itself on a different port?

Thanks in advance.

cc: @exekias what do you think>

@exekias
Copy link
Contributor

exekias commented Nov 26, 2020

Thanks for the explanations! I chatted a little bit about this with @urso. This API will probably be refactored at some point, so take into account that may affect you, but for now, this change sounds good to me

@newly12
Copy link
Contributor Author

newly12 commented Nov 30, 2020

@exekias thanks for your comment.

@sayden could you please comment and advise how should we move forward on this? Thanks in advance.


for api, h := range handlerFuncMap {
mux.HandleFunc(api, h)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can dynamically add/remove routes. E.g. when starting/removing sub-systems. The problem with a "global" like this one is initialization order. You can not tell when NewWithDefaultRoutes is called. This requires anyone to call AddHandlerFunc from within a init function. The response for routes can be changed (via globals), but the presence/behavior can hardly be configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @urso ! it makes sense for adding/removing routes. I had an implementation for this, please have a check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a "global" routes in the implementation that shares its http.ServeMux with the server initialized in NewWithDefaultRoutes. The reason for this is that I think there needs to be a "global" thing, either routes or a "global" api.Server, in order to allow register/deregister routes, and though there are other ways to initialize api.Server such as its New function, but I don't see it's being used except NewWithDefaultRoutes.


// AddHandlerFunc provides interface to add customized handlerFunc
func AddHandlerFunc(api string, h handlerFunc) {
handlerFuncMap[api] = h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should error/panic if the api is already in use. A new handlerFunc should not be allowed to replace an older handlerFunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now considered in the new implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted to previous implementation and added check in this function..

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all I'm not sure about this change. How do you plan to use these hooks?

Although it is nice to be able to deregister handlers, I wonder if the introduction of this + still having a global routes table with absolute URL prefixes is a good "solution". The "global" mutex might be a problem if you have more than one client accessing the API.

(Long term) If possible I'd like to pass some "context" to sub-systems in Beats via an interface that manipulates the "local" routing path. Something like this (kind similar to gorilla mux, but gorilla does not support removal):

type Router interface {
  Handle(path string, handler http.Handler) (Route, error)
  HandleFunc(path string, fn http.HandleFunc) (Route, error)
}

type Route interface {
  Close() error // remove the route
  SubRoute() Router // allow more handlers to be registered based on current path as prefix, sub handlers get removed when this route is closed.
}

The "root-router" would be passed to the server as parameter. Yet without actual use-cases in mind I'd rather not introduce that much code.

}
}
if log == nil {
log = logp.NewLogger("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger needs a name. The names are used as selectors when enabling debug logging. How about api?

log = logp.NewLogger("")
}
if routes.log == nil {
routes.log = log
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, to me this is a red flag. Using structured logging we don't know which logger context is the correct one. Either we have a singleton pattern or not. But having a global routes + a local server instance is kind of contradictory to me.

In order to make this PR not to big I understand that we still need globals. Although I would like to see something passed as contextual parameter to other subsystems...

Edit: Actually checking the Routes implementation I don't think we need the logger.

}
if _, exist := d.routes[api]; exist {
err := fmt.Errorf("route %s is already in use", api)
d.log.Error(err.Error())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return errors from different places, but this is the only place we log. How about removing this line, then we don't need the logger here.

d.mux.Lock()
defer d.mux.Unlock()
if !strings.HasPrefix(api, "/") {
return fmt.Errorf("route should starts with /")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is quite unfortunate that we need to use absolute addresses when registering. This seems to be a byproduct of having Routes global.

At some point we need to think about some proper API for sub-systems to register/deregister routes. Kind of similar to gorilla/mux SubRoutes.


func (d *Routes) handle(w http.ResponseWriter, r *http.Request) {
d.mux.RLock()
defer d.mux.RUnlock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these locks are "heavy". Although we have an http server, this look will allow to have only one active connection at a time.

@urso
Copy link

urso commented Dec 17, 2020

/test

@urso
Copy link

urso commented Dec 19, 2020

jenkins run the tests please

@urso urso added the needs_backport PR is waiting to be backported to other branches. label Dec 21, 2020
@urso urso merged commit 82c11d4 into elastic:master Dec 21, 2020
@newly12
Copy link
Contributor Author

newly12 commented Dec 22, 2020

Thanks for merging this PR!

@newly12 newly12 deleted the customzied_api branch December 22, 2020 00:55
@urso urso added v7.12.0 and removed needs_backport PR is waiting to be backported to other branches. labels Dec 24, 2020
urso pushed a commit to urso/beats that referenced this pull request Dec 24, 2020
@@ -38,6 +40,10 @@ func NewWithDefaultRoutes(log *logp.Logger, config *common.Config, ns lookupFunc
mux.HandleFunc("/state", makeAPIHandler(ns("state")))
mux.HandleFunc("/stats", makeAPIHandler(ns("stats")))
mux.HandleFunc("/dataset", makeAPIHandler(ns("dataset")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR is already merged but I just noticed this while reviewing the backport PR. Should these three API routes be added via AddHandlerFunc too? That would ensure that someone doesn't try to override these routes when registering a custom one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments.

As mentioned in this reply, the usage of AddHandlerFunc would be registering other routes in a module's "init" phase, that makes customized routes to be shown up before "default" routes, and the reason "default" routes can not be done in "init" phase is the handler functions relies on ns lookupFunc which is passed only in NewWithDefaultRoutes.

I think what can be done is

  • call AddHandlerFunc for default routes as well so at least errors can be logged when there are conflicts.
  • or move default routes registration to init phase and hardcode the ns to be monitoring.GetNamespace and change the interface of NewWithDefaultRoutes .

Any thoughts?

urso pushed a commit that referenced this pull request Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team v7.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants