-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of evaluation of aggregations causes precision issues for float #95665
Comments
roachtest.unoptimized-query-oracle/disable-rules=half/seed-multi-region failed with artifacts on master @ dc974cd698364a4db5acb630162dd8b1856cfad6:
Parameters: |
It would be good to check that in both cases we're using the same stddev algorithm, one of the algorithms that's more numerically stable. (I think we have at least one implementation based on Welford-Knuth but maybe we have more than one implementation?) |
Second failure is dupe of #95264 |
I couldn't reconstruct the
If the order of insertion of rows 2 and 3 in the table is swapped, we get the result:
Here's a test case where selecting from the primary key vs. an index yields different results:
|
The same issue occurs on postgres:
|
Removing release-blocker label. Order of execution precision differences are nothing new, and are expected. |
Precision differences could possibly be eliminated by using a different data type under the covers to perform the operation, like decimal. But this may use more memory and perform worse. Using float vs. decimal is a trade-off of performance vs. precision. |
TLP may have a way to avoid this issue. Maybe this was not extended to unoptimized-query-oracle or costfuzz. |
We discussed this family of issues in our collab session today. We decided this might be a good meta issue for tracking investigation into float precision in aggregation operations. Options we discussed (in no particular order or recommendation):
|
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
106552: tests: support float approximation in roachtest query comparison utils r=rharding6373 a=rharding6373 tests, logictest, floatcmp: refactor comparison test util functions This commit moves some float comparison test util functions from logictest into the floatcmp package. It also moves a query result comparison function from the tlp file to query_comparison_util in the tests package. This commit also marks roachtests as testonly targets. Epic: none Release note: None tests: support float approximation in roachtest query comparison utils Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see #63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: #95665 Release note: None 106607: pkg/util/log: introduce `fluent.sink.conn.errors` metric to log package r=knz a=abarganier Logging is a critical subsystem within CRDB, but as things are today, we have very little observability into logging itself. For starters, we have no metrics in the logging package at all! This makes it difficult to observe things within the log package. For example, if a fluent-server log sink fails to connect to FluentBit, how can we tell? We get some STDOUT message, but that's about it. It's time to get some metrics into the log package. Doing so is a bit of a tricky dance, because pretty much every package in CRDB imports the logging package, meaning you almost always get a circular dependency when trying to make use of any library within pkg/util/log. Therefore, this PR injects metrics functionality into the logging package. It does so via a new interface called `LogMetrics` with functions that enable its users to modify metrics. The implementation of the interface can live elsewhere, such as the metrics package itself, whe circular dependencies aren't such a pain. We can then inject the implementation into the log package. With all that plumbing done, the PR also introduces a new metric, `fluent.sink.conn.errors` representing fluentbit connection errors whenever a fluent-server log sink fails to establish a connection. We can see the metric represented below in a test, where no fluent-server was running for a moment, before starting it. Note that the connection errors level off once the server was started: <img width="1018" alt="Screenshot 2023-07-11 at 1 32 57 PM" src="https://github.com/cockroachdb/cockroach/assets/8194877/ccacf84e-e585-4a76-98af-ed145629f9ef"> Release note (ops change): This patch introduces the counter metric `fluent.sink.conn.errors` to the CockroachDB tsdb, which is exposed to `/_status/vars` clients as `fluent_sink_conn_errors`. The metric is incremented whenever a `fluent-server` log sink fails to establish a connection to the log sink pointed to by the `address` for the sink in the provided log config. Epic: CC-9681 Addresses: #102753 107453: Revert "github-pull-request-make: temporary workaround" r=postamar a=postamar This reverts commit 2bd61c0. Relates to #106920. Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Marius Posta <[email protected]>
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
Before this change unoptimized query oracle tests would compare results using simple string comparison. However, due to floating point precision limitations, it's possible for results with floating point to diverge during the course of normal computation. This results in test failures that are difficult to reproduce or determine whether they are expected behavior. This change utilizes existing floating point comparison functions used by logic tests to match float values only to a specific precision. Like the logic tests, we also have special handling for floats and decimals under the s390x architecture (see cockroachdb#63244). In order to avoid costly comparisons, we only check floating point precision if the naiive string comparison approach fails and there are float or decimal types in the result. Epic: None Fixes: cockroachdb#95665 Release note: None
roachtest.unoptimized-query-oracle/disable-rules=half/seed-multi-region failed with artifacts on master @ b21379bb56dd206e6f63cc7d07ca72e85db7a4c4:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-23664
The text was updated successfully, but these errors were encountered: