-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we drop support for INTs? #3479
Comments
Can't tell if joking or serious... I can write succinct encodings of abstract datatypes into int64s. Float? My god I wouldn't want to write those converters. |
@chadbrewbaker we are serious, can you give a relevant use case where |
For data pipelines I usually like uint64 arrays which you just memcpy or RMA across the wire. Type them appropriately as needed. Learn from the MPI masters. Judiciously reuse their open source code, http://static.msi.umn.edu/tutorial/scicomp/general/MPI/content6.html |
@chadbrewbaker I'm not sure what to say about 1. Floats can't be trusted? For 2, if you know you're writing in strings as ints because you're changing it to that because of succinct abstract data types, I would suggest one of two things. First, don't bother, use the actual type names in the string. We'll take care of compression for you. Or use tags, which already avoid copying those strings around. Second, if you still want to do it, you know all those strings are ints, so you can just convert them on your end instead of using padded strings. For 3, I'm not sure what you're talking about. The timestamps in InfluxDB are all int64 nano-second precision epochs. This isn't the proposed change. If you wish to store additional timestamps, that's a request for a date/time data type, not an int64. But even then I have no idea what not trusting clocks has to do with this proposed change. Time is never exact in a distributed system, this is common knowledge and not particularly controversial. If you want a more efficient wire transfer protocol, that's a separate request. At this point we're not getting to it anytime soon so you'd be best to write a proxy that can handle your efficient representation, which will then submit the correct type values to Influx. Nothing I'm seeing in this argument is convincing since you're not saying that the loss of precision from int64 to float64 will have an impact. |
Hi Paul, I'm fine with that. In influxdb-java i already write numbers always with at least one fraction digit, even if a int was given. This is done because the troubles you mentioned. The only concern i have with floats is the loss of accuracy compared to BigDecimals or even int64 a big int64 number converted to a float64 will loose accuracy and some people will complain about that im sure. On the other side storing BigDecimals (i dont know if a golang equivalent exists) is an overkill performance wise. Just my 0.02$ |
Hmm, I can see cases where int64 is useful for firing in data from sensors where we don't know the encoding. e.g. if a sensor uses the high bits for something then we'll end up outside the 2^53 range, or if it's giving us different-endian data (and we want to acquire the data opaquely before sorting it out later). Aside from physical sensors, consider hardware counters from chips etc that very much will use all 64 bits. If the primary issue is that client libraries are fumbling the types of things and giving ints when not desired, then perhaps keep the int64 support internally and just make sure it only kicks in when a client does something very explicit to request it? |
This is true especially for routers and such, if an application later wants to calculate rates from the absolute values the error will raise if counter values are stored as float. |
+1 for keeping the integers; they make life a lot easier by being easy to bit-manipulate, as opposed to floats where implementations vary a lot more. @chadbrewbaker 's 1) and 2) and @jcsp 's cases are valid. Another example of mine: it's easy to program a strictly monotonically increasing integer, but it's less so to program a strictly monotonically increasing float without first converting the float to an int64 and then back. |
@haf, even if we used a float under the hood, you could use an int in your client code. The only difference is that every number in the line protocol would be converted to a float and stored using that. The conversion isn't on the client end, it's on our side. And the loss of precision for counters isn't really a concern given the range of values that can still be represented as a float. Like Prometheus' argument that you'd have to increment that counter millions of times a second for hundreds of years. You're going to get a reset first. The rates of change all still work with float64 on our end. The one thing I've seen valid so far in this thread is that hardware sensors might use the higher order bits on ints to do something different. Is that from actual experience that it's something people do? Here's another option. We update the line protocol so that if you want an int for a field value, you follow the number with an
Then any field value without an i is always parsed as a float64. |
Requiring explicit trailing 'i' for ints and otherwise defaulting to floats sounds perfectly sensible to me. |
Who says the ADT always is a counter in the range of natural numbers? That only counts ones? It could be any ADT that relies on the property of static monotonicity. That + the fact that you can represent the floats easily with ints but not the other way around makes keeping ints a compelling argument. You don't need a paper on how to work with ints, but you probably need one to work with floats – hence the previous comment about not trusting client libraries from @chadbrewbaker Sample ADTs (not necessarily be the best examples though): BigInteger values? 128 bit decimals? Interval Tree Clocks? A counter of network traffic for an internet router that counts in increments of terrabytes with a stored value of #bytes? E.g. "my backup SaaS company" has transferred 1 PiB this month ingestion, and now 512 KiB more:
Now my counters don't update anymore. ;) |
My main use for Influx requires that I store many nanosecond timestamps as a value in the DB. Using a float64 would limit me to ~us precision which isn't sufficient. If I were forced to use float64, then I would be forced to drop Influx. |
We use the int field to store OPC quality codes, which are 8-bit masks. However I think client-side casting can handle this need just fine. |
After the reaction on this thread I'm thinking the best plan is the one I outlined in my last comment. Update the line protocol to require a trailing Then we need to fix the outstanding bugs associated with using ints and we should be good to go. This will give us the best of both worlds. It'll fix the usability problem with the API that is causing a bunch of people pain, but it will keep support for ints, which many people want. |
I think most are going to tell you to keep integer support. Perhaps this is harsh, but few understand what it takes to build, maintain, and ship quality software, especially if it isn't their software to build, maintain, and ship. I think in the short-term you need to optimize for stability, simplicity, and production-level quality. Whichever decision satisfies that quicker, is what you should do. |
Thanks pauldix for addressing this issue. |
Ok, it's official, we're keeping integer support but modifying the line protocol slightly to improve usability: #3519. It'll be in the 0.9.3 release. Thanks for the feedback everyone. |
@Adrien-P That's right, we set the type based on the first value written in. The issue with having mixed types for a given field in a measurement is that we end up having to case them when doing aggregations, which could yield unexpected results. |
We're considering dropping support for INT types. These have been the source of many bugs, which we can fix, but more importantly, they've been a bit of a usability headache. People are having trouble writing libraries that actually force float data types when that's what they want. Thus they're getting errors about the field type and reporting these as bugs in our code, which in this case isn't true.
We could remedy this by writing client libraries ourselves since we know the protocol for writing data, but it's problematic and we're unlikely to have the bandwidth to write all those libraries any time soon.
Given the range of values that can be represented with float64s, I'm at a bit of a loss for what we'd need the int types for. I think the Prometheus team have a pretty good argument: http://prometheus.io/docs/introduction/faq/#why-are-all-sample-values-64-bit-floats-i-want-integers
The reason I'm asking is because we promised no breaking changes in the 0.9 line. This would be a breaking change. We'd make sure that databases that had INT types would work, but those values would get cast to float64.
What do people think about this change? Are there use cases where you think an int64 is actually required? I'm interested in hearing people's thoughts.
The text was updated successfully, but these errors were encountered: