Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series data model #2580

Closed
codebien opened this issue Jun 29, 2022 · 7 comments
Closed

Time Series data model #2580

codebien opened this issue Jun 29, 2022 · 7 comments
Assignees
Labels
Milestone

Comments

@codebien
Copy link
Contributor

codebien commented Jun 29, 2022

We want to introduce the time series concept to k6 for getting the benefit of efficiently identifying the combination between a metric and the relative tags.

A summary of the benefits it brings across the platform:

  • It allows executing faster lookups and comparisons by a single value, the ID (hash) of the series.
  • It reduces the storage by allocating just one time a possible combination (the time series) and referencing it in all other places.
  • It allows aggregating data by time series.
  • It enables easier detection of high cardinality issues just by counting the entire unique data set of generated time series.
  • It enables time series based outputs to simplify the integration's logic (e.g. Prometheus).

Data model

Metrics and Series

  • Metric (name, type)
  • Time Series (metric, tags)
  • Sample (timestamp, value, time series ref)
  • Tag - (aka k6/1831)
// This is the current k6 Metric polished from the Thresholds dependencies.
type Metric struct {
	name     string
	type     MetricType
	contains ValueType
}

type TagSet atlas.Node

type TimeSeries struct {
        Metric *TimeSeries
        Tags *TagSet
}

type Sample struct {
        TimeSeries *TimeSeries
        Timestamp uint64
        Value float64
}

Sink

The sinks are implemented by metric types and they keep values up to date by time series and/or aggregated views:

Storage

It will store the root of Atlas in the metrics.Registry and it will provide a method for get it for branching out a new TagSet.

type Registry struct {
        ...
	tagSetRoot *atlas.Node
}

func (r *Registry) RootTagSet() *TagSet {
	return (*TagSet)(r.tagSetRoot)
}

Samples generation

The current sampling process is controlled by the metrics.PushIfNotDone method. All the actual callers should resolve the time series from the storage before pushing a new Sample, or in the event, no one is found the storage will create and insert a new one.
It requires the dependency from the time series database for all the callers (e.g. executors, JavaScript modules). Most of them already have the metrics.Registry dependency.

metrics.Ingester

Ingester is responsible for resolving the entire set of Sinks impacted from the ingested time series then it adds the Sample's value to the resolved Sinks.

Example

In a t0 where the status of the seen time series is:

http_req_duration{status:200,method:GET}
http_req_duration{method:GET}
http_req_duration{status:200}
http_req_another_one{status:200}

The Ingester in the case of a collected Sample http_req_duration{status:200,method:GET} then it resolves the dependencies with the other seen time series in a unique set where it contains the following time series http_req_duration{status:200} and http_req_duration{method:GET}. It can now resolve the relative Metric's Sinks and it invokes them passing the Sample's value.

Known issues

  • Name and URL tags: k6 is tagging HTTP requests with the URL. It will create a high cardinality issue for the time series data model. This should be fixed by adding the possibility to not store all the tags as indexable, having the availability to set them as metadata and exclude them from the time series generation. An alternative workaround could be to exclude them from the default set of enabled tags.
  • We need to keep all the data for the Trend type for computing the percentiles. We plan to migrate to some form of approximation (Sparse Histogram, OpenHistogram, T-Digest, etc..). Use HDR histograms for calculating percentiles in thresholds and summary stats #763
@codebien codebien added this to the v0.40.0 milestone Jun 29, 2022
@codebien codebien self-assigned this Jun 29, 2022
@na--
Copy link
Member

na-- commented Jun 29, 2022

The Sample struct will probably have an extra field for the metadata / non-indexed tags needed for #2128 (among other things). That shouldn't affect anything else though, these tags will just be ignored when building the TimeSeries.

@codebien
Copy link
Contributor Author

codebien commented Jun 29, 2022

It's relevant to mention that we are not going probably to implement the entire refactoring in a single iteration. The first expected work should be:

  • Add the TimeSeries struct as described in the description.
  • Add the TimeSeries pointer to the Sample struct, getting the following result:
// A Sample is a single measurement.
type Sample struct {
	Metric *Metric
	Time   time.Time
	Tags   *SampleTags
	Value  float64
        TimeSeries *TimeSeries
}
  • Populate the new field invoking registry.GetOrCreateSeries before invoking PushIfNotDone from all the parts where the samples are generated.
  • Migrate the metrics.Ingester's logics to use the TimeSeries. (EDIT: this point can be moved in the next iterations as well).
  • Announce the future breaking change to the metrics API in a warning or mention in the documentation and recommend migrating to Sample.TimeSeries field. This will be very important for output extensions.

It enables us to start fixing issues blocked by this specific concept but also to not break change with a bad UX.

Only in the next iterations, the redundant fields from the Sample struct will be removed and the Metric struct will be refactored as described in the description.

@mstoykov
Copy link
Contributor

mstoykov commented Jul 4, 2022

I have some questions(no particular order):

k6 is tagging HTTP requests with the URL. It will create a high cardinality issue

I would expect this will be addressed first?

We need to store all the time series generated from k6 run during the executio

Does that mean that we will forever hold all the tags + the name for each timeseries or just the hash?

Tag - (aka #1831)

is there are particular thing from that issue that is intended to be implemented? It doesn't seem clear and having them as []Tag seems like it won't lend it self to actually at least some of the propositions there. IIRC a lot of the propositions there are basically to have the whole []Tag be one struct that we can construct effeciently and then we come up with different definitions of "efficiently" and ways to accomplish them.

After I started looking into this last week and wrote some thoughts down, I did spend some time over the weekend to make https://github.com/mstoykov/atlas (mostly on the benchmarks actually 🙄).

I kind of wonder if something similar can not replace the SampleTags internal representation and then to over time change the code to make a better use of it. This likely will be better of tried really fast to see if it doesn't totally destroy the performance of k6, before we spend time to make it not breaking as possible and then for it to turn out to not work great.

@codebien
Copy link
Contributor Author

codebien commented Jul 4, 2022

I would expect this will be addressed first?

We have parallel discussions and I expect we will have parallel implementations. We can set it as an acceptance criteria for merging but I wouldn't set it as a requirement for starting the development. In the eventual case, that we can't achieve the final design for fixing the url tag before we are ready to merge, I think we can work around discarding the url tag from the TimeSeries generation if name and url are set with different values.

Does that mean that we will forever hold all the tags + the name for each timeseries or just the hash?

We have to keep all the tags in some place otherwise we can't map them from the Outputs. So, it should be convenient to make the TimeSeries the central place where they live. (Note: in this way, the proposal for the first step makes them duplicated between the TimeSeries and the Sample structs).

is there are particular thing from that issue that is intended to be implemented?

Yes, the internal tags field and the (TimeSeries) GetTags() []Tag method. But only if we have a consensus for the final design of Tag. I will back on this after an evaluation of your proposal.

@codebien
Copy link
Contributor Author

codebien commented Aug 4, 2022

The design evolved in a better direction, the summary of the current design can be found at #2594 (comment) and #2594 (comment). The issue's description has been updated also.

@na-- na-- modified the milestones: v0.40.0, v0.41.0 Aug 19, 2022
@codebien
Copy link
Contributor Author

codebien commented Oct 4, 2022

The main part has been implemented and it is now available in master. I keep this issue open until I haven't split the remaining parts from this issue into smaller and dedicated issues.

@codebien
Copy link
Contributor Author

I opened #2735 which is the missing part from this issue. HDR histogram and metric's structs refactor have already well-known open issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants