Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization pipeline aggregations #51005

Closed
polyfractal opened this issue Jan 14, 2020 · 1 comment · Fixed by #56399
Closed

Normalization pipeline aggregations #51005

polyfractal opened this issue Jan 14, 2020 · 1 comment · Fixed by #56399
Labels
:Analytics/Aggregations Aggregations >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Top Ask

Comments

@polyfractal
Copy link
Contributor

This proposal is for one (or several) pipeline aggs that can perform normalization of the metrics. For example, given the series of data:

[5, 5, 10, 50, 10, 20]

A user might want to normalize those in different ways:

  • Rescale [-1, 1]
    • [-1, -1, -0.77, 1, -0.77, -0.33]
  • Rescale [0, 100]
    • [0, 0, 11.11, 100, 11.11, 33.33]
  • Percentage of sum [0, 100%]
    • [5%, 5%, 10%, 50%, 10%, 20%]
  • Mean normalization
    • [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
  • Z-score normalization (mean of zero, stdev of 1)
    • [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
  • Softmax (0-1 range, sum to 1, larger values have more weight)
    • [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]

etc etc

The two obvious use-cases are rescaling values to a a [0, 1] range to make it easier to compare relative magnitudes, and normalizing to percentage of the sum for percentage charts.

More advanced functions like z-score are useful for their statistical properties, softmax can handle negative numbers nicely, etc. But I'm not sure how useful they would be in practice, since this is operating over bucket values and not raw values (which is where normalization/centering/standardizing typically has value).

In any case, a pipeline agg could accept the values from a multi-bucket agg (like a date_histo) and perform the normalization to produce a new set of metrics. Unsure how the syntax would look. If it was a single-purpose agg (percentage_of_sum) it's easy. But if we want to build a multi-function agg that can perform multiple functions, we either need a selectable function or something like MovingFunction where the user specifies a script (with helper methods)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@talevy talevy self-assigned this Apr 30, 2020
@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
talevy added a commit to talevy/elasticsearch that referenced this issue May 8, 2020
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes elastic#51005.
talevy added a commit that referenced this issue May 14, 2020
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes #51005.
talevy added a commit to talevy/elasticsearch that referenced this issue May 14, 2020
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes elastic#51005.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Top Ask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants