-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/stats: conversion of datums to and from quantile function values #83123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for breaking this out! Looks pretty good, mostly just some nits.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
-- commits
line 29 at r1:
TIL Assists is a keyword.
pkg/sql/stats/quantile.go
line 120 at r1 (raw file):
return tree.NewDInt(tree.DInt(math.MinInt32)), nil } if i >= math.MaxInt16 {
MaxInt32?
Could use another clamping test case.
pkg/util/timeutil/pgdate/pgdate.go
line 148 at r1 (raw file):
func MakeDateFromPGEpochClampFinite(days int32) Date { if days < lowDays { return Date{days: lowDays}
Why can't we just return LowDate
here?
pkg/sql/stats/quantile_test.go
line 32 at r1 (raw file):
val float64 err bool res tree.Datum
There doesn't seem to be a test case that populates res
. Is that intentional? I infer from below that we always expect the roundtrip to equal dat
, but in that case I think it's better to remove this and just compare to the original dat
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/stats/quantile.go
line 120 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
MaxInt32?
Could use another clamping test case.
Gaaaah! Good catch! Done, and added more test cases.
pkg/util/timeutil/pgdate/pgdate.go
line 148 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Why can't we just return
LowDate
here?
Good point! Done.
pkg/sql/stats/quantile_test.go
line 32 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
There doesn't seem to be a test case that populates
res
. Is that intentional? I infer from below that we always expect the roundtrip to equaldat
, but in that case I think it's better to remove this and just compare to the originaldat
below.
Good point, done.
d6db735
to
c7f1d40
Compare
Bazel CI failures are just flakes. This is RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have some nits, but it
Another suggestion would be to write a randomized test for round-tripping thought the quantile functions, but it doesn't have to be in this PR, so a TODO will suffice.
Reviewed 1 of 5 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/stats/quantile.go
line 112 at r2 (raw file):
case 16: if i <= math.MinInt16 { return tree.NewDInt(tree.DInt(math.MinInt16)), nil
nit: for consistency with other types below maybe update i
in-place to have only a single return
per type?
pkg/sql/stats/quantile.go
line 155 at r2 (raw file):
// Then clamp to pgdate.Date. return tree.NewDDate(pgdate.MakeDateFromPGEpochClampFinite(int32(days))), nil case types.TimestampFamily:
nit: cases for Timestamp and TimestampTZ look exactly the same, with one line difference, maybe squash them?
pkg/sql/stats/quantile_test.go
line 294 at r2 (raw file):
} // Test conversions from quantile value to datum and back. TestToQuantileValue
nit: consider renaming the tests to something like TestQuantileValueRoundTrip
and TestQuantileValueRoundTripOverflow
. When I started reading the code in this file I was confused why for TestToQuantileValue
test we would not do the whole round trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion would be to write a randomized test for round-tripping thought the quantile functions
+1
Reviewed 1 of 5 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rharding6373)
pkg/sql/stats/quantile.go
line 75 at r2 (raw file):
return 0, tree.ErrFloatOutOfRange } return float64(v.Unix()) + float64(v.Nanosecond())*1e-9, nil
Doesn't seem like this will round-trip since the location isn't included here. Is there any possible issue with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I added a TODO and will do that in another PR. TFTRs!
bors r=yuzefovich,rytaft
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @rharding6373, @rytaft, and @yuzefovich)
pkg/sql/stats/quantile.go
line 75 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Doesn't seem like this will round-trip since the location isn't included here. Is there any possible issue with that?
My understanding is that TIMESTAMPTZ
doesn't actually store a location when written to disk (and neither does TIMESTAMP
). When we decode them, the time.Time
in the datum has nil location. When comparing, casting, and printing, we assume that TIMESTAMPTZ
is in the session time zone and TIMESTAMP
is UTC, and make adjustments to the time.Time
to account for these assumptions.
So, I believe in histograms we can treat them the same way as each other, because they are "timezoneless" until comparing, casting, or printing.
pkg/sql/stats/quantile.go
line 112 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: for consistency with other types below maybe update
i
in-place to have only a singlereturn
per type?
Done. Had to special case MaxInt64
.
pkg/sql/stats/quantile.go
line 155 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: cases for Timestamp and TimestampTZ look exactly the same, with one line difference, maybe squash them?
Done.
pkg/sql/stats/quantile_test.go
line 294 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: consider renaming the tests to something like
TestQuantileValueRoundTrip
andTestQuantileValueRoundTripOverflow
. When I started reading the code in this file I was confused why forTestToQuantileValue
test we would not do the whole round trip.
Done.
Build failed (retrying...): |
Build failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
Reviewed 1 of 5 files at r1, 1 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rharding6373 and @rytaft)
Previously, rharding6373 (Rachael Harding) wrote…
TIL Assists is a keyword.
It's not a keyword: https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
Using a non-keyword like "assists" or "informs" is a way to link to the issue without auto-closing it when the PR is merged.
Failing a linter. bors r- |
To predict histograms in statistics forecasts, we will use linear regression over quantile functions. (Quantile functions are another representation of histogram data, in a form more amenable to statistical manipulation.) The conversion of histograms to quantile functions will require conversion of histogram bounds (datums) to quantile values (float64s). And likewise, the inverse conversion from quantile functions back to histograms will require the inverse conversion of float64 quantile values back to datums. These conversions are a little different from our usual SQL conversions in `eval.PerformCast`, so we add them to a new quantile file in the `sql/stats` module. This code was originally part of cockroachdb#77070 but has been pulled out to simplify that PR. A few changes have been made: - `histogramValue` has been renamed to `FromQuantileValue`. - Support for `DECIMAL`, `TIME`, `TIMETZ`, and `INTERVAL` has been dropped. Clamping these types in `FromQuantileValue` was too complex for the first iteration of statistics forecasting. We expect the overwhelming majority of ascending keys to use `INT` or `TIMESTAMP` types. - Bugs in `FLOAT4`, `TIMESTAMP` and `TIMESTAMPTZ` conversions have been fixed. - We're now clamping timestamps to slightly tighter bounds to avoid the problems with infinite timestamps (see cockroachdb#41564). Assists: cockroachdb#79872 Release note: None
bors r=yuzefovich,rytaft,mgartner |
Build succeeded: |
To predict histograms in statistics forecasts, we will use linear
regression over quantile functions. (Quantile functions are another
representation of histogram data, in a form more amenable to statistical
manipulation.)
The conversion of histograms to quantile functions will require
conversion of histogram bounds (datums) to quantile values (float64s).
And likewise, the inverse conversion from quantile functions back to
histograms will require the inverse conversion of float64 quantile
values back to datums. These conversions are a little different from our
usual SQL conversions in
eval.PerformCast
, so we add them to a newquantile file in the
sql/stats
module.This code was originally part of #77070 but has been pulled out to
simplify that PR. A few changes have been made:
histogramValue
has been renamed toFromQuantileValue
.DECIMAL
,TIME
,TIMETZ
, andINTERVAL
has beendropped. Clamping these types in
FromQuantileValue
was too complexfor the first iteration of statistics forecasting. We expect the
overwhelming majority of ascending keys to use
INT
orTIMESTAMP
types.
FLOAT4
,TIMESTAMP
andTIMESTAMPTZ
conversions have beenfixed.
problems with infinite timestamps (see sql: 'infinity' returned as timestamp rather than 'infinity' #41564).
Assists: #79872
Release note: None