-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BigDecimal data type #46934
Comments
Is there any reason that these values would need to be stored as BigDecimal on Elasticsearch? Could regular keywords work well enough? There has been interest from the community in the past around adding Big(Numeric) type support to mappings, but often times using string data directly is effective enough for most applications. For reference, this issue (#17006) contains some discussions around BigInteger and BigDecimal support. |
Pinging @elastic/es-search |
I don't think so, but maybe I don't understand the suggestion. |
Thank you for making this suggestion. We are not interested to develop support for BigDecimal. Further information about our approach towards support for high accuracy data types can be found in other issues such as Support for BigInteger and BigDecimal #17006 (mentioned above), Add support for 64-bit unsigned integers #32434, and Support for Ethereum data type (uint256) #38242. We are interested in fully understanding your use case since in the past we did occasionally found that what starts as a general request for BigDecimal can be answered either with high precision data types that have more limited precisions or with alternative solutions. In addition, insight into the use case can help us in future roadmap decisions. |
Hi @giladgal, "high accuracy" data types are not enough, we need "complete accuracy", i.e no precision loss. Any functionality that requires any sort of accounting (keeping track of ledgers, etc) cannot afford any data loss. I have described a simple use-case above, I guess I can expand on it a bit? Line 300 in e2b9c1b
BigDecimal ? Anyway, just an observation, but it's besides the point really).
In summary, why do you need a decimal type in elastic? For the same reason you need it in any programming language and/or database: when you can't afford precision loss (i.e in a lot of financial/accounting/science/math applications). |
You cannot to agg across keywords
Sure, when you are not doing any sort of numerical operation on the data. @giladgal From the other issues you linked, I see that one of the primary concerns is performance. To me, this seems like it should be a user decision: if I as the user choose to use an arbitrary precision type, at the cost of performance, that should be my decision, while the compute back end does the work. |
I understand your frustration but performance is not the only reason why we don't expose BigDecimal in metrics aggregations. We choose doubles over any other representation to ensure that we have a single type of source to handle, any field that is used in a metric aggregation is first casted/transformed into a double so that we can have a single implementation for all of them. We are an analytics search engine and a lot of aggregations are based on the fact that an approximation is ok as long as we know the error bound. Changing the representation to allow BigDecimal would be a huge amount of work (basically duplicating all aggregations that use double) and make the cost of maintenance higher. Not mentioning performance again, any new metric aggregation would need to be written in multiple flavor, one for the common representation and one for BigDecimal. If the goal is to allow a financial application to perform accurate arbitrary precision computation I'd say that Elasticsearch is probably not the right tool for it. However we could consider adding BigDecimal support for range queries and range aggregations. These feature are isolated per type so we can introduce new format more easily and we also have the data structures to allow some efficiency in the retrieval. I know that's not what you expect but I wanted to mention it instead of closing this issue because we have no plan to work on the support for metrics aggregation. |
@jimczi I wanted to add some more context to the use case mentioned by @tzachshabtay as I think it could be useful in reflecting on the decision to unconditionally use doubles for all aggregations. Our primary storage for all numbers is Postgres. We use their NUMERIC type for arbitrary precision. We then feed this Postgres information into ElasticSearch to perform business analytics akin to what's advertised for ElasticSearch's financial analytics marketing material. When this transfer happens, we see a loss of fidelity when we aggregate because ElasticSearch converts everything to a double, even if we use I don't think I fully understand the reasoning behind the assumption that all aggregations are approximations. This assumption really breaks down as financial firms adopt both ElasticSearch and crypto, which operates with extremely high levels of precision. Wouldn't it be better if the aggregation engine used generics such that it could operate in either full-fidelity or approximate-fidelity as required by the user? You mentioned that you would need to duplicate all aggregations to support BigDecimal -- wouldn't using generics and/or well-placed abstractions help here? |
Some more questions to better understand the requirement:
|
Apologies for the delay in getting back to you. Answers below:
For a given column, I think having the same scale for all values would be okay.
I've personally have used the 64-bit integer technique for storing fixed-point numbers in the past. When I tried that in crypto, it fell on its face because the range assumptions don't work. I think you would need to have a true arbitrary-precision class that isn't limited by 64-bit storage. If implemented with well-placed abstractions, the implementation of this arbitrary-precision class could be backed by 64-bit integer if the range allows. That way, you don't lose performance if you don't need true arbitrary-precision.
To start off with, we'd be happy with |
Hi, We also have kind of the same requirements as @sachnk and @tzachshabtay. We are dealing with crypto financial data. For "tick" level data, we don't mind storing the information as string for no loss of precision. Nevertheless, we would like to do different types of aggregations with it. And precision loss is not really an option for us as we are dealing with very precise floating numbers. We would mostly need the average and the weighted average. Min and max would also be cool but that's less prio for us. So big +1 for having the BigDecimal type in ES even tho I fully understand that it requires you a lot of refactoring. |
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
Relates to #60149 |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Pinging @elastic/es-search (Team:Search) |
example : block chain. the ETH vs it's minimum unit Wei, 1 ETH = 1^18 Wei ,very big num |
Any news here? |
@Laurianti no news. This isn't as simple as adding a new numerical type. We would need to adjust how these values are stored and decoded as Lucene itself doesn't really support accurate decimal number storage to these extreme values. So, there are three problems:
No movement has been made on this recently. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
BigDecimal data type:
For a financial application, we need maximum accuracy and arbitrary precision, even at the expense of performance.
scaled_float
doesn't cut it for us (we need numbers bigger than 100 billion and with more than 8 digits after the decimal place, so there is no scaling factor that can work), anddouble
obviously isn't accurate enough for doing financial calculations as it's using a floating point value.The values must be passed as strings in the json and then parsed and stored directly to
BigDecimal
in elastic search, as strings are the only way to pass accurate financial numbers (with arbitrary precision) in a json document.The text was updated successfully, but these errors were encountered: