New component: Flex Metrics Receiver #14753

cristianciutea · 2022-10-06T13:32:12Z

The purpose and use-cases of the new component

Description:

Flex Metrics Receiver will be an application-agnostic cross-platform receiver with which users can instrument any application that exposes metrics over a standard protocol (HTTP, file, shell) in any format (for example, JSON or plain text). The receiver will scrape and parse the data which will be transformed into OTEL metrics format, based on the rules defined in the yaml configuration.

This solution would be best suited for monitoring custom solutions for which a dedicated receiver doesn't exist.

Context:

This solution is inspired by the NewRelic flex integration which is an application-agnostic, all-in-one tool that allows you to collect metric data from a wide variety of services. It comes bundled with NewRelic infrastructure agent. You can instrument any app that exposes metrics over a standard protocol (HTTP, file, shell) in a standard format (for example, JSON or plain text): you create a YAML config file, start the Infrastructure agent, and your data is reported to New Relic.

NewRelic flex already has dozens (200+) of example yaml config files that can be shared and these contributions would increase the surface area of OTEL instrumentation. Each config yaml file can be viewed as an independent/different Receiver.
Also, after learning the flex syntax, we can create new Receivers easily without development/releasing a new collector version.
Community contributes with new examples without having knowledge of the Collector internals.

Example configuration for the component

The following examples illustrate the most common use cases: HTTP, db and shell. This configuration structure is inspired by the logs operator pattern.

HTTP Example:

---
flexreceiver:
  config:
      name: http_example
      vars:
        client_id: 'XYZ'
      inputs:
        - url:
            run: http://endpoint/{vars.client_id}/token/
            method: POST
            payload: >-
              client_id={vars.client_id}
            # Ignore the output, this call is just for obtaining the token for next calls
            ignore_output: true
            store_var:
              # Assuming that the endpoint will return {“atoken”: “XYZ”}
              token: atoken
        - url:
            run: http://endpoint/{vars.client_id}/data/
            method: GET
            headers:
              Authorization: Bearer {vars:token}

            # Processors should be allowed on both levels: input and collector
      operators:
        - jq: '.[0][0]["server_metrics"]' # output e.g.: {"uptime": 12}
      metrics:
        - name: uptime
           value_from: http_example.uptime 
           unit: s

Database query example

---
flexreceiver:
  config:
    name: db_example
    inputs:
      - db:
          db_driver: mysql
          db_conn: newrelic:Password@tcp(rds-name.region.rds.amazonaws.com:3306)/sys
          - queries:
              run: SHOW VARIABLES;
  # if metrics operators rules are not specified, default db operator would be used.

Shell command example

flexreceiver:
  config:
    name: cmd_example
    inputs:
      - command:
        # e.g.
        # 35092301.29 34692781.58
            run: |
              "cat /proc/uptime | awk '{print $0}'"
    operators:
      - regex_parser: "(?P<secondsUptime>.*)\s+(?P<secondsIdleCores>.*)"
      - math:
           metric_name: newMetric
           formula: "cmd_example.secondsUptime / cmd_example.secondsIdleCores"
    metrics:
      - name: secondsUptime
         value_type: double
         value_from: cmd_example.secondsUptime 
         unit: s
         attributes:
           server_name: server_name
     - name: newMetric
        value_from: cmd_example.newMetric
        value_type: double
        unit: s

Telemetry data types supported

Currently flex supports only metrics

Is this a vendor-specific component?

This is a vendor-specific component
If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

Flex library will be maintained by New Relic, however this library will be an agnostic data fetcher and parser without any backend specific features.

Documentation

OTEL flex receiver will contain documentation on how to use the existing configuration examples as well as on how to use each input/processor in order to enable users to build their own integrations

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2022-10-19T16:09:27Z

This looks similar to what we wanted to do for filelog definitions from stanza for logs. Is there a way to have any uniformity of this metric scraping capability with logs parsing capabilities? cc @djaglowski

tigrannajaryan · 2022-10-19T16:12:12Z

It would be great if the configuration of operators (e.g. jq, regex) is uniform with what we do for logs operators.

tigrannajaryan · 2022-10-19T16:31:15Z

Discussed in SIG:

Is there a spec for the config file format, list of operators and there syntax?
Can we get a sense on how much we would lose by not supporting a "run" command? What portion of existing flex definitions needs it?
Can you figure out a way to consolidate the operators with filelog operators? Are there operators that do the same thing but have different names? Can we alias them? Are there operators that are similar but slightly different and are hard to merge?

djaglowski · 2022-10-20T15:29:50Z

Can you figure out a way to consolidate the operators with filelog operators? Are there operators that do the same thing but have different names? Can we alias them? Are there operators that are similar but slightly different and are hard to merge?

I think this would be quite a challenge, but I'll try to provide enough of an analysis that we can consider it further.

The first thing to note is that the filelog operators are built around a particular representation of the log data model. Specifically, pkg/stanza/entry.Entry. This representation is basically a flattened version of plog. i.e. each Entry has its own independent copy of resource attributes, etc. The obvious challenge of adapting operators to process either metrics or logs, or some overlapping representation.

This would also introduce a notion that is essentially "pipeline type" to pkg/stanza.

Beyond that, we can consider the implications by grouping the filelog operators into a few categories:

Simple parsers (timestamp, severity, etc). These are not directly applicable, but presumably there would be some equivalent operators for metrics (start_time, end_time, etc) and in some cases these could share a lot of code.
Complex parsers (json, csv, regex, etc). In theory, these could be adapted to be useful but a few caveats apply.

It would be necessary to constrain the parse_from and parse_to fields, according to the signal type. e.g. Metrics do not have a body to parse from.
Simple parsers are "embeddable" within complex parsers. More details here The idea is that complex parsers isolate values which may be immediately interpreted and saved into the data model. Likely it would make sense to have embeddable metric operators as well. Either way, the set of operators that may be embedded would need to be sensitive to the singal type.

Transformers (move, remove, add, copy, etc). These would be the most easily shareable operators, but the same caveat applies in regards to which fields can be specified for parse_from and parse_to.

It's an interesting idea and does make some sense. pkg/stanza would become something of a generalized solution for extracting signals from text/bytes. I think it would be quite a lot of work though, so it's worth weighing the benefits of sharing code vs the additional effort.

tigrannajaryan · 2022-10-20T15:37:41Z

The obvious challenge of adapting operators to process either metrics or logs, or some overlapping representation.

@djaglowski I think you are implying that the implementations of these operators need to be shared. That's an option but not a requirement. From end user perspective it is important the behavior of these operators are uniform for metrics and logs, but implementations may be done in different bits of code. Of course we need to be very careful with this approach to ensure the behaviors are truly uniform and don't drift apart, but that is probably a matter of proper automated testing of observable behaviors.

I am not saying we should not aim to also have shared implementations. Shared implementations are desirable if it is reasonably doable, but it is not an absolute requirement.

djaglowski · 2022-10-20T15:42:28Z

That's fair. It'd be possible to just share some of the more complex bits of code, such as timestamp configuration & parsing.

github-actions · 2022-12-20T03:29:58Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2023-03-14T03:30:59Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2023-05-26T22:00:39Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

cristianciutea added the needs triage New item requiring triage label Oct 6, 2022

evan-bradley added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Oct 7, 2022

github-actions bot added the Stale label Dec 20, 2022

fatsheep9146 removed the Stale label Jan 12, 2023

andrzej-stencel mentioned this issue Mar 5, 2023

Collect metric based on /proc/locks #18829

Closed

github-actions bot added the Stale label Mar 14, 2023

andrzej-stencel mentioned this issue May 24, 2023

New component: Git Provider Receiver #22028

Closed

2 tasks

github-actions bot added the closed as inactive label May 26, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New component: Flex Metrics Receiver #14753

New component: Flex Metrics Receiver #14753

cristianciutea commented Oct 6, 2022 •

edited

Loading

tigrannajaryan commented Oct 19, 2022

tigrannajaryan commented Oct 19, 2022

tigrannajaryan commented Oct 19, 2022

djaglowski commented Oct 20, 2022 •

edited

Loading

tigrannajaryan commented Oct 20, 2022

djaglowski commented Oct 20, 2022

github-actions bot commented Dec 20, 2022

github-actions bot commented Mar 14, 2023

github-actions bot commented May 26, 2023

New component: Flex Metrics Receiver #14753

New component: Flex Metrics Receiver #14753

Comments

cristianciutea commented Oct 6, 2022 • edited Loading

The purpose and use-cases of the new component

Description:

Context:

Example configuration for the component

Telemetry data types supported

Is this a vendor-specific component?

Sponsor (optional)

Additional context

Documentation

tigrannajaryan commented Oct 19, 2022

tigrannajaryan commented Oct 19, 2022

tigrannajaryan commented Oct 19, 2022

djaglowski commented Oct 20, 2022 • edited Loading

tigrannajaryan commented Oct 20, 2022

djaglowski commented Oct 20, 2022

github-actions bot commented Dec 20, 2022

github-actions bot commented Mar 14, 2023

github-actions bot commented May 26, 2023

cristianciutea commented Oct 6, 2022 •

edited

Loading

djaglowski commented Oct 20, 2022 •

edited

Loading