Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/mongodbatlas] Add support for consuming logs #12347

Closed
cpheps opened this issue Jul 13, 2022 · 12 comments
Closed

[receiver/mongodbatlas] Add support for consuming logs #12347

cpheps opened this issue Jul 13, 2022 · 12 comments

Comments

@cpheps
Copy link
Contributor

cpheps commented Jul 13, 2022

This would be a Logging addition to the current MongoDB Atlas Receiver. To monitor logs for MongoDB Atlas you need to pull from the Logging API which returns a gzip of a log file. The SDK that is currently used in the receiver provides a LogsService to retrieve logs.

Following the current configuration split with metrics and alerts:

Config


receivers:
  mongodbatlas:
    # existing metrics settings
    public_key: ${MONGODB_ATLAS_PUBLIC_KEY}
    private_key: ${MONGODB_ATLAS_PRIVATE_KEY}
    granularity: PT1M
    retry_on_failure:
       enabled: false
    # existing alerts settings
    alerts:
      enabled: false # default false
      endpoint: "0.0.0.0:9999" # webhook endpoint, no default
      secret: "some_secret_string" # webhook secret, no default
    # new logging settings
    logs:
      enabled: true #default true
      projects: # List of projects to monitor logs from
        - name: project_1 # Human readable name of the project
          include_clusters: [] # optional list of clusters in a project to include for log collection, default empty
          exclude_clusters: [] # option list of clusters in a project to exclude from log collection, default empty
          collect_audit_logs: false # The user needs to enable this in Atlas so it needs to be enabled here

The current metrics receiver monitors all projects within the organization. Since logs may grow much more than metrics in a system at scale the user must specify projects they wish to collect logs from via the projects list.

As suggested by @djaglowski here's how include_clusters and exclude_clusters would work:

  • Both include_clusters and exclude_clusters are unspecified (the default) => Collect all clusters
  • Only include_clusters is specified => Only collect the named clusters
  • Only exclude_clusters is specified => Collect all but the named clusters
  • Both include_clusters and exclude_clusters are specified => Config error

Overloading Logging Receiver


Since there is already a logs receiver for collecting alerts we'll have to implement a single logs receiver that will collect both alerts and logs using the same consumer. The logs collected from the logging API will have attributes that are unique from the alert generated logs.

Since we can reuse a lot of the client setup, and configuration values, in the current receiver this would mean adding a few functions to that receiver code that are geared towards collecting logs.

@BinaryFissionGames
Copy link
Contributor

I think the config and overview generally look good to me. I think the current behavior is that alerts are disabled by default, which I had done since I thought this use case (scraping actual logs) was what someone would assume would happen when placing the component in a logs pipeline.

One question I have is if the logs should have some configurable polling interval, since this would be pull-based receiver, or does it make sense to have that hard-coded? (looks like the logs update every 5 minutes, so that would be the "default" interval).

@cpheps
Copy link
Contributor Author

cpheps commented Jul 13, 2022

@BinaryFissionGames

I think the current behavior is that alerts are disabled by default

You're right I missed this. I'll update the proposal to reflect that. I still think it's possible to have alerts and logs active at the same time as long as the logs generated don't overlap/conflict. I don't think this is a real concern though.

One question I have is if the logs should have some configurable polling interval, since this would be pull-based receiver, or does it make sense to have that hard-coded? (looks like the logs update every 5 minutes, so that would be the "default" interval).

I wanted to avoid specifying a polling interval in logging as I thought it would conflict with the overall scraper interval. The API documentation does recommend an interval of 5 minutes:

If you are polling the API for log files, polling every five minutes is recommended.

I was just going to hardcode this in to avoid confusion and it seems like a large enough interval that performance shouldn't be an issue.

@cpheps
Copy link
Contributor Author

cpheps commented Jul 13, 2022

@zenmoto Pinging you as code owner if you have any objections to this.

@zenmoto
Copy link
Contributor

zenmoto commented Jul 13, 2022

This looks great to me.

@djaglowski
Copy link
Member

Regarding the exclude_clusters option - should we anticipate that some users may have very large projects in which this would not be sufficient? It seems we could easily support a mutually exclusive include_clusters option which would allow clusters to be opted-in.

@cpheps
Copy link
Contributor Author

cpheps commented Jul 13, 2022

@djaglowski

Regarding the exclude_clusters option - should we anticipate that some users may have very large projects in which this would not be sufficient? It seems we could easily support a mutually exclusive include_clusters option which would allow clusters to be opted-in.

Thinking through this if you have an empty include_clusters nothing is collected? I'm not apposed to this, I do want to make sure whatever the configuration is that it's easy for a customer. I'm working on trying to figure out the size of a typical environment. If a typical environment is large, meaning a lot of clusters, then I would say include_clusters makes sense. If it's small and only has a cluster or two then I think the exclude_clusters makes more sense.

I'll follow up here once I get a better idea.

@djaglowski
Copy link
Member

I would expect it to work like this:

  • Both include and exclude are unspecified (the default) => Collect all clusters
  • Only include is specified => Only collect the named clusters
  • Only exclude is specified => Collect all but the named clusters
  • Both include and exclude are specified => Config error

@cpheps
Copy link
Contributor Author

cpheps commented Jul 13, 2022

@djaglowski Makes sense. I prefer one exclude list due to simplicity of configuration. It's usually easier to turn things off you don't want than figure out what you do want on. Also, code wise a single list is easier to filter on but it isn't hard either way just less "moving parts" code wise.

I'm curious if there is a standard (or pattern) of exclude list, include list, or both across receivers? I know filelog has an include and exclude but those lists aren't mutually exclusive.

I don't feel strongly enough to push back on this more wondering at this point if there's a standard.

@cpheps
Copy link
Contributor Author

cpheps commented Jul 13, 2022

I'll update the config proposal to include both include and exclude but I'll change if turns out this conflicts with a standard in other components.

@djaglowski
Copy link
Member

Not sure if there is a well established standard. Comparing this to filelog though, the difference is only, I think, that the lists support a notion of pattern matching, whereas I think these lists would be exact matches only. When you have pattern matching, there is a sensible way to apply both (find all matches based on include patterns, then remove those that match exclude patterns). When using exact matching, there isn't really a use for applying both.

@github-actions
Copy link
Contributor

Pinging code owners: @zenmoto

@evan-bradley
Copy link
Contributor

Closing #12347 as completed by #12800.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants