Investigate telegraf integration in k6 #1064

na-- · 2019-06-27T17:39:15Z

After #1060 and and #1032 (comment), I think it makes some sense to investigate potentially integrating telegraf in k6. And I don't just mean sending metrics from k6 to a telegraf process, since that should currently be possible with no extra changes in k6 - telegraf has an InfluxDB listener input plugin that k6 can send metrics to.

Rather, since telegraf is a Go program with a seemingly fairly modular and clean architecture, it may be worth investigating if we can't use parts of it as a library in k6. If we can figure out a way to use it, we'd pretty much have a very universal metrics output "for free":

it has various processor and aggregator plugins
it has over 35 output plugins, a few of which we also have (influxdb, datadog, kafka), but most of which we lack
it's a very active project, but I think the base architecture would change very rarely, so I doubt we'd have much issues updating the dependency

The two main sticking points I foresee are the configuration and metric mismatches between it and k6. So, instead of a huge refactoring effort, my initial idea is to investigate if we can't just add a telegraf output in k6 that can accept any telegraf options. That is, instead of one new k6 output type per one telegraf output type, we could have a single "universal" k6 output type that could, via configuration, be used to filter/aggregate/output k6 metrics in every way telegraf supports.

This way, we don't have to refactor a lot of k6 - we can transparently convert the k6 metrics to whatever telegraf expects. The configuration would be trickier, since telegraf expects its configuration in a TOML format... And I'm not sure there's any way to change that, considering that even the simple file output just has toml struct tags and apparently that's enough, since the constructor just returns the empty struct (which I assume the config is unmarshaled into).

We can try to convert JSON to TOML, though I don't think it's worth it, since the k6 outputs can't be configured from the exported script options yet anyway (#587). Instead, we probably should stick with the TOML config and just pass it via the CLI, like how the JSON output file is currently being specified: k6 run script.js --out telegraf=my_telegraf.conf or something like that.

Another thing we should evaluate is how big of a dependency telegraf would be. The current repo has ~200k Go LoC, but its vendor has around 5 million... I think a lot of those would be dropped, since we won't need any of its 150+ input plugins and other things, but there's still a good chance that this dependency would actually be bigger than the rest of k6 😄 Even so, I don't think that would be a huge issue, since with the number of plugins it has, I assume that the base APIs are very stable... It's just something that we need to keep in mind, given that we vendor our dependencies in our repo (and they don't).

The text was updated successfully, but these errors were encountered:

na-- · 2019-06-27T17:56:37Z

A slight update to the vendor numbers above. I saw that their dep configuration didn't feature any pruning... If I add this to their Gopkg.toml:

[prune]
  go-tests = true
  unused-packages = true
  non-go = true

... and re-run dep ensure, vendor drops from 433 MB and 7.3M LoC (with comments and whitespace) to "just" 59 MB and 1.5M LoC 😄 Still not ideal, but considering that it should substantially drop if we don't use their input plugins, I don't think it would be an issue.

na-- · 2019-06-28T10:40:01Z

After a quick look through some of the open k6 issues, if we have a nice and configurable telegraf output, we can:

close Add additional stats providers #343, since most of the outputs there are supported by telegraf
close Running k6 with high number of VUs overloads InfluxDB #1060, since telegraf would offer us a second, more circuitous way to push data to InfluxDB, but one supporting metric filtering, transformation and aggregation
delay Add an option to filter the emitted metrics to outputs #570 - while I'd still like to implement that directly in k6 for performance and UX reasons, the telegraf output would offer a temporary work-around for a lot of its use cases, so we can delay it a bit longer - maybe giving us time to at least partially fix Configuration issues #883 before it...
obviate PRs Feature/prometheusv2 #478 and Feature/cloudwatch #1032 and save us from writing tens of other outputs in k6 ourselves
maybe even eventually deprecate some of our current native outputs... for example:
- I think that due to the lack of generic metric filtering and aggregation in k6, the current DataDog output could have some of the same potential issues the proposed CloudWatch output has (Feature/cloudwatch #1032 (comment))
- the current kafka output is a bit cumbersome and inflexible, and telegraf not only supports kafka, but it also has many more data formats that could be used with it
have a clearer direction about go-metrics for metrics handling #429, Refactor the current way metrics are defined #572, and similar issues, since we would be able to contrast how k6 and telegraf handle metrics - this could give us a clearer understanding if, how and what we should refactor in k6
maybe close or delay issues like How to add a standard deviation metric #515 and Calculations Based on Metrics #817, depending on how flexible the processor and aggregator telegraf plugins are...
peek into (or copy parts of) their cpu and mem / internal input plugins, so we can implement New CPU and memory usage metrics #888; see if other input metrics would be useful in k6 as well - file descriptors, network traffic stats, etc?

As I mentioned above, the TOML telegraf configuration may be a bit tricky to integrate with k6 in a nice way, especially when it comes to bundling it in archives. However, I think that, overall, the actual configuration complexity it introduces would be magnitudes lower than the alternative. And by alternative, I mean k6 having a bespoke implementation (with a custom configuration) for each possible metric output that people may want... So, I think this would make #883 and maybe even #587 easier, especially if we decide to eventually deprecate some of our existing outputs like kafka and datadog. It will also make #743 easier, since we can just point people to the telegraf documentation for a lot of things.

na-- · 2019-07-02T10:48:22Z

As an addendum, I don't think we should replace every single metric output with telegraf. For example, the CSV output (#321 / #1067), once the PR is fixed, seems to be a good output to have natively in k6. It's simple, has not external dependencies, and since we want performance out of it, it's probably not worth the overhead to convert between our metrics and the telegraf ones...

Moreover, I don't immediately see a way to do it with telegraf. There's a file output, but no csv serializer or data format. And having it natively in k6 has benefits, since k6 knows what most of the metric tag keys could be (i.e. the systemTags), making it easy to create columns for them.

na-- · 2019-08-27T13:48:21Z

As mentioned in this discussion #478 (comment), we might investigate https://openmetrics.io/ as a base to work from. Connected issue: #858

SuperQ · 2019-08-27T13:54:02Z

For the record, I'm biased as a Prometheus and OpenMetrics contributor. 😄

The Prometheus/OpenMetrics data model and metrics output format is very widely supported by a variety of monitoring systems. Adopting Prometheus doesn't lock you into Prometheus. The format is already supported by InfluxDB/Telegraf, Datadog, Elastic, etc.

liclac · 2019-08-27T18:17:50Z

This isn't my job anymore, but please remember that k6' metrics pipeline is performance-critical, and doing this would essentially involve serialising metrics into one format, asking Telegraf to deserialise it, do its own processing to it, and then re-serialising it into another one.

This is heavyweight even as a standalone process (I've seen the CPU graphs of a production workload running through it), but it's at least justified there; doing it in-process just seems roundabout to me.

(#858 seems like a much more sensible way to do it if you want one output to rule them all; if you really like Telegraf, you can point OpenMetrics at its Prometheus ingress.)

SuperQ · 2019-08-28T07:00:51Z

The Prometheus/OpenMetrics format was designed for fast and cheap encoding and parsing. Originally Prometheus used JSON for metrics. But, like you said, handling metrics should be low overhead. JSON was just to CPU intensive.

The currently used format was created to reduce CPU use in encoding and decoding metrics. It's extremely efficient, so the overhead in Telegraf should be quite minimal. Also, the Prometheus client_golang library is extremely efficient itself. Incrementing counters is a 12-15 CPU nanosecond operation. Histogram observations are 25ns.

InfluxDB was involved with creating OpenMetrics, so you can be assured that they're going to support it well.

The amount of data and overhead we're talking about is extremely small. While I understand that you would be concerned about overhead in exchanging data, the amount we're talking about here is extremely tiny. If you're running into excess CPU use when handling metric samples, you may possibly have other problems going on.

IMO, going with telegraf as a library is a much heavier weight option.

na-- · 2019-08-28T07:23:14Z

@liclac, thanks for chiming in! 🎊

please remember that k6' metrics pipeline is performance-critical, and doing this would essentially involve serialising metrics into one format, asking Telegraf to deserialise it, do its own processing to it, and then re-serialising it into another one

There seems to be some misunderstanding here. We want to investigate using telegraf as a library, not as a standalone process, so we're not going to serialize metrics before we pass them on to it. For the simplest use cases, just using the telegraf outputs, the overhead should be just wrapping our metric Samples in something that implements telegraf's Metric interface, so that we can then pass them to the desired Output. I haven't dug into the telegraf codebase very deeply, but to me it seems like the amount of serializing/deserializing between it and any of our other outputs (including any future OpenMetrics/Prometheus ones) shouldn't be much different...

If it was just about outputs, you'd probably be right that just using something like OpenMetrics would probably be better, especially in the long term. As I mentioned elsewhere, I have nothing against supporting OpenMetrics natively, it's worth evaluating and it'd make sense to support it if the industry is headed in that direction. But this situation isn't either/or, we can happily do both... 😄

The biggest reason OpenMetrics by itself is not sufficient is because it's just another data format, which doesn't solve some of the problems we've had. The basic problem is that k6 produces a lot of data. Currently we emit at least 8 metric samples for every HTTP request we make. Soon, it may be 9, once we start tracking DNS times again... Double that (or more) when there are redirects... And as you know, this isn't counting other metrics that are emitted for every iteration, or group() call, or just every second, etc. k6 emits tons of metrics and that's a problem, because a lot of users don't care about all of the metrics, or their granularity, but they currently don't have a way of filtering or aggregating what they get.

And while it's probably worth it, at least for filtering data, to implement something natively in k6 (#570), or it might be worth it to extend the current cloud aggregation code to be more universal, investigating telegraf processors/filters and aggregators seems like a sensible thing to do. Worst-case, we learn something about how (not) to implement these things ourselves, best-case - we have a reasonabe workaround solution for some time.

And I also want to stress again that this is an investigation. We likely won't pursue telegraf integration if it turns our that it's actually super heavy and affects running k6 tests overly much. It's probably worth it to investigate other approaches as well... For example, if the real-time data processing turns out to be too heavy, we can evaluate dumping all k6 metrics on disk in an efficient format (binary or OpenMetrics or whatever) and then post-processing them after the load test is done, so we don't affect the actual test execution. Or something else...

SuperQ · 2019-08-28T07:44:14Z

The basic problem is that k6 produces a lot of data. Currently we emit at least 8 metric samples for every HTTP request we make.

I think I'm beginning to understand a bit more of what's going on here. What is currently being done is seems to be using InfluxDB metrics to generate event sample logs, rather than metrics for monitoring k6 itself.

These metrics sound a lot more like event logs, rather than metrics. Part of the reason we created Prometheus/OpenMetrics libraries the way they are is that each event inside the system doesn't generate any kind of visible output. We intentionally don't care about individual events, only sampling counters of events.

As you generate more and more events per second, you might start to consider throwing away the data from each event and record the data to a histogram datatype. You lose the individual event granularity, but as you scale up the traffic, metric output is constant.

If you want all event data for deep analysis, you might consider a structured logging output. We had the same issue in GitLab. As our traffic grew, our implementation of InfluxDB metric events for each request was overwhelming. We're phasing out our InfluxDB use here, and moving this data to JSON structured logs for deep analysis and Prometheus metrics for real-time monitoring.

na-- · 2019-08-28T08:02:55Z

As you generate more and more events per second, you might start to consider throwing away the data from each event and record the data to a histogram datatype.

Again, for us this isn't either/or 😄 For example, if you don't use any metric outputs and just rely on the end-of-test summary, we currently throw away most of the data, and once we move to using HDR histograms (#763) or something similar, we'll throw away all of the data and have a constant memory footprint.

But using histograms for some period (say, 1s, 10s, etc.) is a whole nother kettle of fish 😄 Definitely worth investigating, but much more complicated. Also conveniently, telegraf has a histogram aggregator like that... See why I don't mind spending some time investigating what they're doing and maybe integrating what currently exists in k6 as a stop-gap until we have something better? 😄

If you want all event data for deep analysis, you might consider a structured logging output. We had the same issue in GitLab. As our traffic grew, our implementation of InfluxDB metric events for each request was overwhelming. We're phasing out our InfluxDB use here, and moving this data to JSON structured logs for deep analysis and Prometheus metrics for real-time monitoring.

This makes sense, and we also currently have something like this. The JSON output fills that gap (though it could probably use some performance improvements, even after we merge #1114) and we'll soon have a CSV output (#1067)...

SuperQ · 2019-08-29T11:37:33Z

But using histograms for some period (say, 1s, 10s, etc.)

I'm not sure what you mean by this. With histograms, it's the same as with normal counters, you always have every data point. But with histograms you keep things in bucketed granularity.

Maybe you're thinking of storing percentile summaries that have a decay factor? These can't be aggregated, as the math doesn't work out. We support these in Prometheus, but we don't encourage them because of the aggregation issue.

Prometheus supports arbitrary histograms, but not as efficiently as it could. There is some work in progress to be able to directly support "HDR Histograms" in both Prometheus and OpenMetrics.

na-- · 2019-09-02T11:13:08Z

Maybe you're thinking of storing percentile summaries that have a decay factor? These can't be aggregated, as the math doesn't work out.

I'm also not sure exactly what you mean here, sorry 😄 . Can you share a link with more information? I'm starting to get the impression that we're talking about slightly different things, or more likely, discussing similar things from very different perspectives... I'll try to backtrack... a little bit (😅) and explain how metric values are generated and used in k6, as well as what is currently lacking from my perspective, leading up to this issue and others. Sorry for the wall of text... 😊

So, in k6, a lot of actions emit values for different metrics:

a single http.request() call currently emits 8 different ones: http_reqs (just a counter), http_req_blocked, http_req_connecting, http_req_tls_handshaking, http_req_sending, http_req_waiting, http_req_receiving, and http_req_duration (which is sending + waiting + receiving for convenience).
websocket connections also emit various metrics: ws_sessions, ws_msgs_sent, ws_msgs_received, ws_ping, ws_session_duration, and ws_connecting
when an iteration ends, the following metrics are emitted: iterations (a counter), iteration_duration, data_sent, data_received
every second the number of active VUs (vus) and the number of initialized VUs (vus_max) is also emitted
check() calls also emit a checks metric, and group() calls emit group_duration
users can also define whatever custom (usually application-specific) metrics they want - https://docs.k6.io/docs/result-metrics#section-custom-metrics
the number and scope of metrics k6 will measure and emit is only going to increase with time - New CPU and memory usage metrics #888, Emit an errors metric and have a default error rate script abort threshold #877, Measure DNS lookup times for HTTP requests #1011, Better metrics about sleep and group #921, Support for gRPC protocol #441, Feature request: DNS test library #851, Add database support #1019, post-New executors #1007 arrival-rate metrics, etc...

These are the different measurements k6 makes (or will soon make). Once they are measured, they currently have 3 possible purposes:

To be displayed in the end of test summary stats. This is enabled by default but can be disabled by --no-summary.
To be used in the calculations of various user-configured thresholds. This is enabled by default, can be disabled by --no-thresholds.
To be passed to one or more external outputs. No external output is configured by default, but with one or more -o/--output CLI flags users can easily enable whatever they want. That can be InfluxDB, StatsD, or some other supported service, but can also be a .json file or a .csv file (since Add output option for csv format #1067, so in the current master). And this can be done for a bunch of different reasons - debugging, post-processing or correlation of k6 metrics with external data, detailed comparisons between test runs, drilling down and filtering, etc.

For a single k6 test run, a user may use any combination of the above 3 points, including none or all of them. And when we're discussing Prometheus/OpenMetrics, in my head it's only relevant to point 3 above, unless I'm missing something.

What's more, it doesn't address one of the biggest issues k6 currently has with external outputs (point 3) - that k6 just produces too much data... All of the measurements from the first list, including all of their tags, are currently just directly sent to the external outputs. There's currently no way to filter (#570), restrict (#884 (comment)) or aggregate that data stream yet. As you can see from the linked issues, we plan to implement features like that, but a telegraf output could potentially serve (among other things) as a stop-gap temporary replacement for some of these until we do so.

Regarding Prometheus/OpenMetrics - on the one hand, as you say, it would probably reduce the CPU requirements for encoding all of the metrics and is basically becoming the standard format, which is nice, but solves few of our actual problems 😄. On the other hand, the pull model of Prometheus (where AFAIK we have to expose an endpoint for Prometheus to pull the data, instead of us pushing it) seems like a very poor fit for k6. First, because we don't want to keep all of the raw data in k6's memory until something scrapes it, and also because k6 test run duration time aren't usually very long. I guess that's what the pushgateway is for, but still something that needs consideration...

Finally, to get back to HDR histograms and what I meant by "histograms for some period (say, 1s, 10s, etc.)". Currently, the implementation of the end-of-test summary stats and the thresholds (points 1 and 2 in the list above) is somewhat naive. First, to be able to calculate percentiles, it's keeping all Trend metric values in memory for the whole duration of the script. That's basically like a built-in intentional memory leak (#1068), so we want to get rid of it, likely by switching to HDR histograms (#763) for their constant memory footprint.

But the thresholds have another restriction. You can delay their evaluation, and you can even filter by tags in them like this:

import http from "k6/http";
import { check, sleep } from "k6";

export let options = {
    thresholds: {
        // We want the 99.9th percentile of all HTTP request durations to be less than 500ms
        "http_req_duration": ["p(99.9)<500"],
        // Requests with the staticAsset tag should finish even faster
        "http_req_duration{staticAsset:yes}": ["p(99)<250"],
        // Global failure rate should be less than 1%
        "checks": ["rate<0.01"],
        // Abort the test early if static file failures climb over 5%, but wait 10s to evaluate that
        "checks{staticAsset:yes}": [
            { threshold: "rate<=0.05", abortOnFail: true, delayAbortEval: "10s" },
        ],
    },
    duration: "1m",
    vus: 3,
};

export default function () {
    let requests = [
        ["GET", "https://test.loadimpact.com/"],
        ["GET", "https://test.loadimpact.com/404", null, { tags: { staticAsset: "yes" } }],
        ["GET", "https://test.loadimpact.com/style.css", null, { tags: { staticAsset: "yes" } }],
        ["GET", "https://test.loadimpact.com/images/logo.png", null, { tags: { staticAsset: "yes" } }]
    ];

    let responses = http.batch(requests);

    requests.forEach((req, i) => {
        check(responses[i], {
            "status is 200": (resp) => resp.status === 200,
        }, req[3] ? req[3].tags : {});
    });

    sleep(Math.random() * 2 + 1); // Random sleep between 1s and 3s
}

but they are only ever evaluated for all emitted metrics since the start of the test run - you can't restrict them in a time window. I think that's what you meant by "decay factor" above, but my understanding of the proper terms in this area is somewhat poor, so I might be mistaken. In any case, I created a separate issue (#1136) for tracking this potential feature, but we have a lot of work to do before we tackle it.

SuperQ · 2019-09-02T12:59:31Z

In Prometheus client libraries, we have two main "Observation" methods. "Histogram" and "Summary".

When you do something like httpRequestDuration.Observe(someDuration), both types record counters of the number of samples, and the sum total of the sample values.

The difference is that Summary tracks pre-computed percentile buckets like 50th, 90th, 99th, etc. Histogram on the other hand tracks static buckets like 0.001s, 0.05s, 1s, etc.

With Histogram mode, the memory use is constant, as you just need a few float64 values to track each statically defined bucket.

First, because we don't want to keep all of the raw data in k6's memory until something scrapes it

Prometheus metrics are always implemented as "now". So the typical concern with polling isn't valid as the memory use is constant. There is no buffering.

I find it's much easier to convert the pull-based counters into push-based data by having the client library set a push timer internally. This helps make the metric load for push-based systems more consistent. Your example of the "every second the number of active VUs". In a Prometheus style metrics library, you would track the number of active in real-time. Prometheus could pull at whatever frequency, or you can set a push timer. From a code perspective, you only need to increment and decrement the active gauge. The client library handles the rest.

na-- · 2019-09-02T15:02:42Z

Ah, thanks, this cleared up many misconceptions that I had... 😊 How do you deal with different metric value key=value tags/labels/whatever-you-want-to-call-them? Does each unique combination allocate a new Histogram/Summary object in memory?

I'm asking because we heavily annotate any metric values we measure during the execution of a k6 script, and it's impossible to predict the set of possible tag values for any given metric. For example, each of the values for the 8+ http_req_* metrics is tagged with a combination of most of the following tags: proto, status, method, url, name, group, error, error_code, tls_version (configurable, but still)... Additionally, users can specify their own custom tags, either per-request or globally for the test run.

For the InfluxDB output, we've somewhat handled the over-abundance of tags by having the K6_INFLUXDB_TAGS_AS_FIELDS option, which would cause k6 to emit some of the tags as non-indexed fields instead...

SuperQ · 2019-09-05T15:44:17Z

Does each unique combination allocate a new Histogram/Summary object in memory?

We use a metric vector data structure to allocate the mapping of lables.

Having lots of label/tag options isn't a big problem here. As long as the tags aren't completely unlimited like client IPs, user IDs, etc. URLs should be shortened to eliminate any unique IDs. For example github.com/:Org/:Project/issues/:issueID would be a good cardinality reduction.

I have a couple of apps that emit about 30-40k different metrics per instance of the app server. This is well within the limit of a single Prometheus server's capacity. (10M metrics per sever starts to be a memory bottleneck)

piclemx · 2019-10-17T11:57:07Z

Any news for this feature? It will be great to have other output like prometheus or something else.

na-- · 2019-10-18T14:20:14Z

Sorry @piclemx, no progress on this or on a standalone Prometheus output yet. Add a 👍 and watch the issue for news, it will be referenced when we make some progress.

dnlserrano · 2020-06-28T22:15:40Z

Hey guys, first of all thanks so much for the work on k6, it's an awesome project. ⭐

I have two questions regarding this even though I'm not, as others may be, eagerly expecting this feature. Instead, I'm more worried about the current functionality vs. what this addition might bring or take away.

it has over 35 output plugins, a few of which we also have (influxdb, datadog, kafka), but most of which we lack

I don't think we should replace every single metric output with telegraf. For example, the CSV output

I confess I don't know much about telegraf, but from what I can see its datadog output only outputs to the Datadog API directly, and not to the Datadog agent, like k6 does now (and that some users might make use of). Assuming this is correct, my first question would be:

Are you planning to still keep support for emission to Datadog agent, same as CSV which you've hinted at keeping as well?

Furthermore, regarding the problems you describe about histograms: particularly when using Datadog this might not be as much of an issue, since Datadog supports a custom DISTRIBUTION metric which might help here (metrics are flushed to Datadog Agent as they come in k6 and Datadog Agent then pushes them to Datadog API as it deems fit – keeping original metrics data unaggregated). Second question then becomes:

Will the usage of HDR histograms be optional in the context of people using Datadog Agent and emitting DISTRIBUTION metrics?

Note: Adding the ability to collect DISTRIBUTION metrics in datadog (instead of the default TimeInMilliseconds) would potentially be another feature to be implemented.

It's highly likely I'm missing something here. Apologies in advance if some assumption doesn't make sense. Also, I appreciate it may not be possible to accommodate every use case from the get go. Just interested in understanding what the path may be moving forward. Thanks again for all your hard work on k6. Keep up the good job!

na-- · 2020-07-01T08:57:22Z

Instead, I'm more worried about the current functionality vs. what this addition might bring or take away.

We haven't investigated things thoroughly yet, so nothing is decided. As a general rule, if we're going to deprecate something, it will take at least a single k6 version (usually more) where we warn people about the fact, and usually there needs to be a good alternative that covers the same use cases. In this specific case, if we add a telegraf output natively, and if we decide to deprecate any current k6 outputs in favor of it, they will almost surely be available for at least one more version, in tandem with the telegraf output.

That said, I'm not very familiar with the DataDog, but if what you say is true, it's unlikely we'll deprecate the current k6 Datadog output in favor of a telegraf-based one, unless it reaches full feature parity with what we currently offer.

alexrocco · 2020-07-20T09:16:49Z

Just to add to the discussion, I did a small PoC with Telegraf and indeed it works very well with k6. The only problem that I found that could be a blocker, is that by aggregating the metrics before sending it to the upstream it will lose the percentiles metrics since it won't have the whole sample to properly calculate it. But even for that, there is a solution being discussed there influxdata/telegraf#6440, that would implement t-digest algorithm and that gives a very close percentile approximation on-line.

Also looking into the Telegraf project I don't see it as an API/library that could be used on K6 code and for me it would better to recommend Telegraf as a sidecar tool to aggregate k6 metrics. If this is something to do, I would be glad to help to document or show some examples of how to integrate both tools.

na-- · 2020-07-20T13:57:21Z

Also looking into the Telegraf project I don't see it as an API/library that could be used on K6 code and for me it would better to recommend Telegraf as a sidecar tool to aggregate k6 metrics.

Yeah, the possibility of using telegraf internally as a Go library is far from decided. On the one hand, the interfaces look simple enough that we might be able to reuse them. On the other hand, there are bound to be complications, and as I wrote above, "two main sticking points I foresee are the configuration and metric mismatches between it and k6".

And, as you've mentioned, using it as a sidecar tool has been possible for a long time, basically since k6 could output metrics to InfluxDB, and isn't very inconvenient. It's not as efficient, but for smaller load tests, it should be good enough.

If this is something to do, I would be glad to help to document or show some examples of how to integrate both tools.

If you're willing, that'd be awesome! ❤️ Our docs are in a public repo, for example here are the ones for the outputs. Every page in the docs should have a Suggest edits button in the top right corner that will lead you to its source markdown file. If you want to submit a PR with a short tutorial how to use telegraf as a sidecar, please ping me here or in the PR itself!

arukiidou · 2022-03-21T09:01:37Z

Unfortunately, importing the following adds quite a lot of dependencies.
Is it hard to support this natively？(without xk6)

github.com/influxdata/telegraf/plugins/outputs/influxdb_v2

mstoykov · 2022-04-01T12:21:33Z

Hi @arukiidou, I do think it's now unlikely that we will ever put the whole telegraf in k6.

Since this was proposed we have output extensions effectively letting people write an extension to output to whatever they want. This also can be used to make a telegraf output extension 🎉 . I do doubt the k6 team themselves will do this.

Additional to that if somebody will make an extension and then use it - it might be better to just run telegraf on the side. In both cases there will be additional work and the configuration will be the telegraf one (or a subset of it I guess).

This fairly old comment of mine showcases how to use it for additional amount of features we don't support currently as well.

mstoykov · 2023-12-11T10:26:28Z

I am closing this as at this point we are going to be pivoting towards #2557 and #1761 as way more standard metric outputs that just happen to be more or less supported by everyone.

na-- added enhancement feature evaluation needed proposal needs to be validated or tested before fully implementing it in k6 labels Jun 27, 2019

na-- mentioned this issue Jun 28, 2019

Add additional stats providers #343

Closed

na-- mentioned this issue Jul 9, 2019

Refactor the collectors/outputs #1075

Closed

mstoykov mentioned this issue Aug 27, 2019

Feature/prometheusv2 #478

Closed

na-- added this to the v1.0.0 milestone Aug 27, 2019

mstoykov pinned this issue Sep 25, 2019

na-- unpinned this issue Sep 26, 2019

na-- mentioned this issue Jul 20, 2020

Use HDR histograms for calculating percentiles in thresholds and summary stats #763

Open

na-- mentioned this issue Dec 9, 2020

Support Prometheus Remote Write as an Optional Results Visualization #1761

Closed

na-- removed this from the v1.0.0 milestone Nov 9, 2022

mstoykov closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate telegraf integration in k6 #1064

Investigate telegraf integration in k6 #1064

na-- commented Jun 27, 2019 •

edited

Loading

na-- commented Jun 27, 2019

na-- commented Jun 28, 2019

na-- commented Jul 2, 2019

na-- commented Aug 27, 2019

SuperQ commented Aug 27, 2019

liclac commented Aug 27, 2019

SuperQ commented Aug 28, 2019

na-- commented Aug 28, 2019 •

edited

Loading

SuperQ commented Aug 28, 2019

na-- commented Aug 28, 2019 •

edited

Loading

SuperQ commented Aug 29, 2019

na-- commented Sep 2, 2019

SuperQ commented Sep 2, 2019

na-- commented Sep 2, 2019

SuperQ commented Sep 5, 2019

piclemx commented Oct 17, 2019

na-- commented Oct 18, 2019

dnlserrano commented Jun 28, 2020

na-- commented Jul 1, 2020

alexrocco commented Jul 20, 2020

na-- commented Jul 20, 2020

arukiidou commented Mar 21, 2022

mstoykov commented Apr 1, 2022

mstoykov commented Dec 11, 2023

Investigate telegraf integration in k6 #1064

Investigate telegraf integration in k6 #1064

Comments

na-- commented Jun 27, 2019 • edited Loading

na-- commented Jun 27, 2019

na-- commented Jun 28, 2019

na-- commented Jul 2, 2019

na-- commented Aug 27, 2019

SuperQ commented Aug 27, 2019

liclac commented Aug 27, 2019

SuperQ commented Aug 28, 2019

na-- commented Aug 28, 2019 • edited Loading

SuperQ commented Aug 28, 2019

na-- commented Aug 28, 2019 • edited Loading

SuperQ commented Aug 29, 2019

na-- commented Sep 2, 2019

SuperQ commented Sep 2, 2019

na-- commented Sep 2, 2019

SuperQ commented Sep 5, 2019

piclemx commented Oct 17, 2019

na-- commented Oct 18, 2019

dnlserrano commented Jun 28, 2020

na-- commented Jul 1, 2020

alexrocco commented Jul 20, 2020

na-- commented Jul 20, 2020

arukiidou commented Mar 21, 2022

mstoykov commented Apr 1, 2022

mstoykov commented Dec 11, 2023

na-- commented Jun 27, 2019 •

edited

Loading

na-- commented Aug 28, 2019 •

edited

Loading

na-- commented Aug 28, 2019 •

edited

Loading