Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow numbers and boolean values for tags. #1712

Merged
merged 6 commits into from
Jan 9, 2019

Conversation

simitt
Copy link
Contributor

@simitt simitt commented Dec 22, 2018

Store numbers as scaled_floats, scaling_factor 1000000
fixes #828.

depends on elastic/beats#9772

Store numbers as scaled_floats, scaling_factor 1000000
fixes elastic#828.
@simitt simitt requested review from graphaelli and jalvz and removed request for graphaelli January 8, 2019 14:51
@felixbarny
Copy link
Member

What is the maximum number that can be stored with this scaled float format? Are strings, numbers and booleans stored in the same place in ES? What if one span has the tag foo: bar and another foo: 42 (different data types for the same tag)?

@simitt
Copy link
Contributor Author

simitt commented Jan 8, 2019

Yes, strings, numbers and booleans are stored at the same place in ES. If the same attribute has different datatypes this will throw an exception.

@simitt
Copy link
Contributor Author

simitt commented Jan 8, 2019

According to ES doc a scaled_float is :

A floating point number that is backed by a long, scaled by a fixed double scaling factor.

where a long is a signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1.

Having a scaled_factor of 1,000,000 would lead to a max value that can be stored with full precision of 2**63/1000000 which equals 9223372036854.

@graphaelli
Copy link
Member

jenkins, retest this please

Copy link
Member

@graphaelli graphaelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow up with some documentation about sticking to a single data type for a given tag.

@simitt
Copy link
Contributor Author

simitt commented Jan 9, 2019

@felixbarny I'd like to merge this soonish - do you have any concerns regarding the handling?

@felixbarny
Copy link
Member

Do users have to possibility to set another mapping type in case they want to set large numbers?

For most use cases, including recording bytes up to 9PB and Unix timestamps in up to millisecond precision the current range is sufficient.

@felixbarny
Copy link
Member

If the user can't configure the mapping type I think it would make sense to validate the range in the APM Server so that the error appears in the agent logs as well rather than only in the APM Server and Elasticsearch logs.

But if possible I think users should be able to configure the mapping type, although the default you choose sounds good to me.

@simitt
Copy link
Contributor Author

simitt commented Jan 9, 2019

There is a way to add fields to the ES template, see discussions around this #1371, which makes use of the experimental feature append_fields (experimental).
This can not be used for tags, as it raises an exception when used for fields that already have a template.

However, one can make use of the setup.template.path and setup.template.overwrite: true config options to point to a custom fields.yml, and add other types here. The custom fields.yml then needs to be manually updated every time the server is updated, to ensure new mappings are added, so all features work as expected.

So to summarize, there is some hackish way to do add custom mapping types.

@simitt
Copy link
Contributor Author

simitt commented Jan 9, 2019

jenkins, retest this please.

@simitt
Copy link
Contributor Author

simitt commented Jan 9, 2019

If the user can't configure the mapping type I think it would make sense to validate the range in the APM Server so that the error appears in the agent logs as well rather than only in the APM Server and Elasticsearch logs.

That is not possible with the setup we have at the moment, as the server would need to either handle the whole request syncronously so errors can be returned back to the agents, or the server would need to fetch all indexed tags with their types and check them for every incoming document, which would require some kind of shared cache to work between multiple servers.

The same handling applies if e.g. a mapping explosion or any other issue with storing the data to ES happens.

@simitt simitt merged commit a2e6d88 into elastic:master Jan 9, 2019
simitt added a commit to simitt/apm-server that referenced this pull request Jan 14, 2019
* Allow numbers and boolean values for tags. Store numbers as scaled_floats, scaling_factor 1000000
* Update tags in metricset to behave the same way as transaction.tags

fixes elastic#828.
simitt added a commit that referenced this pull request Jan 14, 2019
* Allow numbers and boolean values for tags. (#1712)

fixes #828.
@simitt simitt deleted the 828-tags-number-bool branch January 31, 2019 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow setting numbers as tags
4 participants