-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 64-bit unsigned integers #32434
Comments
Pinging @elastic/es-search-aggs |
Just adding this note here for reference. We ran into a use case for this in elastic/ecs#673. The Windows registry only has a concept of unsigned numbers, so we saw two paths to take for numbers greater than 263 for u64. We either cast to s64, which is hard to enforce and too easy to do wrong. The other option was to only store the string representation. Since we have little need for math, we opted for a keyword string field in the interim. |
Following up on @rw-access comment, there are a fair number of data fields coming from the Elastic Endpoint that we would like to store as u64 if possible. Is this issue being actively worked on? |
Same problem here during processing of network performance data from routers as part of a network management system. This data contains counters (e.g. received octets) which are 64-bit unsigned values. It would be great if we could store them properly. |
This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - sorting and aggregations is based on conversion of long values to double and can be imprecise for large values. Closes elastic#32434
We, within the search team, had discussion about the following points: I. How to represent unsigned long internally, in json, and the elasticsearch clients for:
Several options here are:
II. Should we accept mixing longs and unsigned longs when sorting on a field? |
Option 3 is a non-starter for any use-cases that I have worked on. The most common scenario that I see is related to metrics which are counters. Besides the network equipment mentioned by @ktetzlaff, a significant amount of server and application monitoring data are unsigned 64-bit counters. For example, this is the package that Metricbeat uses to gather disk I/O stats (https://github.com/shirou/gopsutil/blob/master/disk/disk.go#L14) and as you can see there are a number of Given that counters will make up the majority of unsigned values, it is important (mandatory really) that at least the derivative aggregation work well and accurately. Whether that is best achieved with option 1 or 2... I would say that 2 is probably preferred. However if 1 is the only way to get the necessary support, and it allows for |
I am in the same Situation as @robcowart : Option 3 would not achieve anything for me in my usecases. The main reason I want 64 bit integers is because sometimes metric values are greater than the maximum size the current data types offer (mostly long running tasks, counted in milliseconds). |
I think we should eliminate option 1 from the picture. It's confusing to rely on a convention that requires to translate a signed long. Option 2 is more natural since we already have an abstraction for numerics in Scripts and sort values. They return a |
Thanks for comments everyone, very valuable. @jimczi I have chatted with the es-clients team, all the clients (php, perl, ruby, python, .NET) support BigInteger except JavaScript. Also for some clients with 32bit platform (which should be quite outdated) don't support BigInteger, e.g. php, ruby. For JavaScript, the language does support BigInts, but the native json parser does not. But this should not be a blocker for us, the es-clients team are willing to write a custom json parser to handle this case. One thing is left is to confirm with Kibana team if they can handle 64 bit integers. Update: Updated option 2:
|
Just to confirm: are we planning a maximum of There are some potential client breaking changes. Javascript client: will need a dedicated generic serializer which is going to be slower for everything. The workaround here is that the default WON'T support ulong/bigints OOTB but the client can be injected with a serializer that supports it either globally or per request. Java HLRC: The java HLRC relies heavily on boxing, e.g the .NET NEST client: In most cases we have dedicated types and interface e.g the range query is backed by an interface I suggest we proceed with option 2 and:
|
PHP client: it supports Because some clients may need to support
|
I've been thinking a lot about this, and I fear that sending numbers above the interop range for json could cause problems to some implementations. From the json spec:
For instance, if JavaScript will get a value outside that range, it cannot longer guarantee arithmetic precision (MDN docs). Given that a custom serializer will introduce significant overhead, and it should be a user choice accept it or not, I agree with @ezimuel that introducing a query parameter such as The JS client could use this query parameter as follows: Unfortunately, this is not a JS problem only, this issue will happen also with other languages. For example if you use
By supporting something like |
@ezimuel @delvedor Thanks a lot for comments and explanation. We've discussed this and decided that we should find a systematic way for the REST layer to handle big integers (also big longs), and this should not be a part of this Issue/PR. I have created a separate issue for this and put your ideas there. For 64-bit unsigned integers, we have decided to proceed with our current plan to return Long/BigInteger. |
This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Closes #32434
This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Backport for elastic#60050 Closes elastic#32434
Introduce 64-bit unsigned long field type This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Backport for #60050 Closes #32434
As a follow-up to discussions in #17006, we agreed in FixitFriday to add support for 64-bit unsigned integers. These numbers will support accurate term and range queries, but aggregations will work on the nearest double value.
The text was updated successfully, but these errors were encountered: