Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Flex Metrics Receiver #14753

Closed
2 tasks
cristianciutea opened this issue Oct 6, 2022 · 9 comments
Closed
2 tasks

New component: Flex Metrics Receiver #14753

cristianciutea opened this issue Oct 6, 2022 · 9 comments
Labels

Comments

@cristianciutea
Copy link
Contributor

cristianciutea commented Oct 6, 2022

The purpose and use-cases of the new component

Description:

Flex Metrics Receiver will be an application-agnostic cross-platform receiver with which users can instrument any application that exposes metrics over a standard protocol (HTTP, file, shell) in any format (for example, JSON or plain text). The receiver will scrape and parse the data which will be transformed into OTEL metrics format, based on the rules defined in the yaml configuration.

This solution would be best suited for monitoring custom solutions for which a dedicated receiver doesn't exist.

Screenshot 2022-10-06 at 08 41 35

Context:

This solution is inspired by the NewRelic flex integration which is an application-agnostic, all-in-one tool that allows you to collect metric data from a wide variety of services. It comes bundled with NewRelic infrastructure agent. You can instrument any app that exposes metrics over a standard protocol (HTTP, file, shell) in a standard format (for example, JSON or plain text): you create a YAML config file, start the Infrastructure agent, and your data is reported to New Relic.

NewRelic flex already has dozens (200+) of example yaml config files that can be shared and these contributions would increase the surface area of OTEL instrumentation. Each config yaml file can be viewed as an independent/different Receiver.
Also, after learning the flex syntax, we can create new Receivers easily without development/releasing a new collector version.
Community contributes with new examples without having knowledge of the Collector internals.

Example configuration for the component

The following examples illustrate the most common use cases: HTTP, db and shell. This configuration structure is inspired by the logs operator pattern.

HTTP Example:

---
flexreceiver:
  config:
      name: http_example
      vars:
        client_id: 'XYZ'
      inputs:
        - url:
            run: http://endpoint/{vars.client_id}/token/
            method: POST
            payload: >-
              client_id={vars.client_id}
            # Ignore the output, this call is just for obtaining the token for next calls
            ignore_output: true
            store_var:
              # Assuming that the endpoint will return {“atoken”: “XYZ”}
              token: atoken
        - url:
            run: http://endpoint/{vars.client_id}/data/
            method: GET
            headers:
              Authorization: Bearer {vars:token}

            # Processors should be allowed on both levels: input and collector
      operators:
        - jq: '.[0][0]["server_metrics"]' # output e.g.: {"uptime": 12}
      metrics:
        - name: uptime
           value_from: http_example.uptime 
           unit: s

Database query example

---
flexreceiver:
  config:
    name: db_example
    inputs:
      - db:
          db_driver: mysql
          db_conn: newrelic:Password@tcp(rds-name.region.rds.amazonaws.com:3306)/sys
          - queries:
              run: SHOW VARIABLES;
  # if metrics operators rules are not specified, default db operator would be used.

Shell command example

flexreceiver:
  config:
    name: cmd_example
    inputs:
      - command:
        # e.g.
        # 35092301.29 34692781.58
            run: |
              "cat /proc/uptime | awk '{print $0}'"
    operators:
      - regex_parser: "(?P<secondsUptime>.*)\s+(?P<secondsIdleCores>.*)"
      - math:
           metric_name: newMetric
           formula: "cmd_example.secondsUptime / cmd_example.secondsIdleCores"
    metrics:
      - name: secondsUptime
         value_type: double
         value_from: cmd_example.secondsUptime 
         unit: s
         attributes:
           server_name: server_name
     - name: newMetric
        value_from: cmd_example.newMetric
        value_type: double
        unit: s

Telemetry data types supported

Currently flex supports only metrics

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

Flex library will be maintained by New Relic, however this library will be an agnostic data fetcher and parser without any backend specific features.

Documentation

OTEL flex receiver will contain documentation on how to use the existing configuration examples as well as on how to use each input/processor in order to enable users to build their own integrations

@cristianciutea cristianciutea added the needs triage New item requiring triage label Oct 6, 2022
@evan-bradley evan-bradley added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Oct 7, 2022
@tigrannajaryan
Copy link
Member

This looks similar to what we wanted to do for filelog definitions from stanza for logs. Is there a way to have any uniformity of this metric scraping capability with logs parsing capabilities? cc @djaglowski

@tigrannajaryan
Copy link
Member

It would be great if the configuration of operators (e.g. jq, regex) is uniform with what we do for logs operators.

@tigrannajaryan
Copy link
Member

Discussed in SIG:

  • Is there a spec for the config file format, list of operators and there syntax?
  • Can we get a sense on how much we would lose by not supporting a "run" command? What portion of existing flex definitions needs it?
  • Can you figure out a way to consolidate the operators with filelog operators? Are there operators that do the same thing but have different names? Can we alias them? Are there operators that are similar but slightly different and are hard to merge?

@djaglowski
Copy link
Member

djaglowski commented Oct 20, 2022

Can you figure out a way to consolidate the operators with filelog operators? Are there operators that do the same thing but have different names? Can we alias them? Are there operators that are similar but slightly different and are hard to merge?

I think this would be quite a challenge, but I'll try to provide enough of an analysis that we can consider it further.

The first thing to note is that the filelog operators are built around a particular representation of the log data model. Specifically, pkg/stanza/entry.Entry. This representation is basically a flattened version of plog. i.e. each Entry has its own independent copy of resource attributes, etc. The obvious challenge of adapting operators to process either metrics or logs, or some overlapping representation.

This would also introduce a notion that is essentially "pipeline type" to pkg/stanza.

Beyond that, we can consider the implications by grouping the filelog operators into a few categories:

  1. Simple parsers (timestamp, severity, etc). These are not directly applicable, but presumably there would be some equivalent operators for metrics (start_time, end_time, etc) and in some cases these could share a lot of code.
  2. Complex parsers (json, csv, regex, etc). In theory, these could be adapted to be useful but a few caveats apply.
  • It would be necessary to constrain the parse_from and parse_to fields, according to the signal type. e.g. Metrics do not have a body to parse from.
  • Simple parsers are "embeddable" within complex parsers. More details here The idea is that complex parsers isolate values which may be immediately interpreted and saved into the data model. Likely it would make sense to have embeddable metric operators as well. Either way, the set of operators that may be embedded would need to be sensitive to the singal type.
  1. Transformers (move, remove, add, copy, etc). These would be the most easily shareable operators, but the same caveat applies in regards to which fields can be specified for parse_from and parse_to.

It's an interesting idea and does make some sense. pkg/stanza would become something of a generalized solution for extracting signals from text/bytes. I think it would be quite a lot of work though, so it's worth weighing the benefits of sharing code vs the additional effort.

@tigrannajaryan
Copy link
Member

The obvious challenge of adapting operators to process either metrics or logs, or some overlapping representation.

@djaglowski I think you are implying that the implementations of these operators need to be shared. That's an option but not a requirement. From end user perspective it is important the behavior of these operators are uniform for metrics and logs, but implementations may be done in different bits of code. Of course we need to be very careful with this approach to ensure the behaviors are truly uniform and don't drift apart, but that is probably a matter of proper automated testing of observable behaviors.

I am not saying we should not aim to also have shared implementations. Shared implementations are desirable if it is reasonably doable, but it is not an absolute requirement.

@djaglowski
Copy link
Member

That's fair. It'd be possible to just share some of the more complex bits of code, such as timestamp configuration & parsing.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants