Skip to content

Commit

Permalink
geoindex,invertedidx,rowexec,encoding,...: add bounding box to invert…
Browse files Browse the repository at this point in the history
…ed column

The bounding box is encoded after the cellid, and used to perform
"pre-filtering", i.e., filtering after the inverted row is fetched
but before it is added to various set expressions.

The fundamental weakness with the cell covering
approach is that the coverings for convex shapes are typically
worse than a bounding box and we pay significant cost in doing
a lookup join to eliminate the false positives. This attempts
to reduce those false positives for the lookup join, and part
of the inverted join (the invertedJoiner can avoid adding a
row to the RowContainer).

For the canonical nyc st_intersects and st_dwithin queries
the joinReader would see a much higher input row count than
the output due to false positives. Specifically,
- st_intersects: input 3110, output 490
- st_dwithin: input 23183, output 8606

With this change the input counts are 1064 and 11177. There
is the additional cost of reading extra bytes and applying
the bounding box filter, but it is more than offset by the
gains (see comparison numbers below).

This is a format change for the index, so is easier to do
before the launch of spatial features.

The encoding here is trivial, and consumes 17 bytes for points
and 33 bytes for others. An experiment with a more complicated
encoding that converted to integers while bounding the
relative error did not show any improvement wrt the following
benchmarks.

There are TODOs for followup PRs:
- Do such pre-filtering for invertedFilterer. This will need
  a spec change.
- Support pre-filtering for expressions containing more than
  one function.

Comparison numbers:

old is master
name                                   old ms     new ms     delta
Test/nyc/nyc-intersects                80.0 ±23%  69.0 ±30%  -13.76%  (p=0.000 n=49+48)
Test/nyc/nyc-dwithin                    225 ±13%   169 ± 8%  -24.88%  (p=0.000 n=46+45)
Test/nyc/nyc-coveredby                 84.1 ±11%  69.0 ±14%  -17.96%  (p=0.000 n=50+49)
Test/nyc/nyc-expand-blocks             85.7 ± 6%  84.6 ± 8%   -1.22%  (p=0.030 n=45+47)
Test/nyc/nyc-dwithin-and-dfullywithin   195 ± 7%   195 ± 6%     ~     (p=1.000 n=46+49)
Test/postgis_geometry_tutorial/11.2b   1.66 ±24%  1.53 ±45%   -7.87%  (p=0.006 n=50+50)
Test/postgis_geometry_tutorial/11.6    1.61 ±19%  1.66 ±32%     ~     (p=0.329 n=46+50)
Test/postgis_geometry_tutorial/12.1.2  0.89 ±26%  0.91 ±26%     ~     (p=0.532 n=46+50)
Test/postgis_geometry_tutorial/12.2.3  1.33 ±20%  1.38 ±36%     ~     (p=0.308 n=49+49)
Test/postgis_geometry_tutorial/12.2.4  1.30 ±27%  1.29 ±18%     ~     (p=0.746 n=49+45)
Test/postgis_geometry_tutorial/13.0    1.49 ±36%  1.55 ±33%     ~     (p=0.141 n=49+50)
Test/postgis_geometry_tutorial/13.1a    233 ± 7%   197 ± 7%  -15.45%  (p=0.000 n=48+47)
Test/postgis_geometry_tutorial/13.1c   24.9 ±15%  19.4 ±18%  -22.17%  (p=0.000 n=50+48)
Test/postgis_geometry_tutorial/13.2     309 ± 4%   257 ± 7%  -16.92%  (p=0.000 n=43+47)
Test/postgis_geometry_tutorial/14.1a   1.51 ±15%  1.70 ±16%  +12.93%  (p=0.000 n=48+44)
Test/postgis_geometry_tutorial/14.2b   6.28 ±22%  6.07 ±19%     ~     (p=0.050 n=50+45)
Test/postgis_geometry_tutorial/14.2c   3.70 ±14%  3.47 ±10%   -5.98%  (p=0.000 n=48+46)
Test/postgis_geometry_tutorial/14.3c   31.9 ±11%  25.9 ±12%  -18.60%  (p=0.000 n=48+49)
Test/postgis_geometry_tutorial/15.0    1.50 ±20%  1.50 ±20%     ~     (p=0.860 n=48+49)

Release justification: This is an incompatible format change,
and discussions with engineering and product indicated a
preference to do this now before we launch spatial features

Release note: None
  • Loading branch information
sumeerbhola committed Sep 8, 2020
1 parent c119b04 commit dd8b81c
Show file tree
Hide file tree
Showing 34 changed files with 1,350 additions and 357 deletions.
5 changes: 5 additions & 0 deletions pkg/geo/geo.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,11 @@ func (g *Geometry) CartesianBoundingBox() *CartesianBoundingBox {
return &CartesianBoundingBox{BoundingBox: *g.spatialObject.BoundingBox}
}

// BoundingBoxRef returns a pointer to the BoundingBox, if any.
func (g *Geometry) BoundingBoxRef() *geopb.BoundingBox {
return g.spatialObject.BoundingBox
}

// SpaceCurveIndex returns an uint64 index to use representing an index into a space-filling curve.
// This will return 0 for empty spatial objects, and math.MaxUint64 for any object outside
// the defined bounds of the given SRID projection.
Expand Down
19 changes: 12 additions & 7 deletions pkg/geo/geoindex/geoindex.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (

"github.com/cockroachdb/cockroach/pkg/geo"
"github.com/cockroachdb/cockroach/pkg/geo/geogfn"
"github.com/cockroachdb/cockroach/pkg/geo/geopb"
"github.com/golang/geo/s2"
)

Expand Down Expand Up @@ -114,8 +115,9 @@ var CommuteRelationshipMap = map[RelationshipType]RelationshipType{
// GeographyIndex is an index over the unit sphere.
type GeographyIndex interface {
// InvertedIndexKeys returns the keys to store this object under when adding
// it to the index.
InvertedIndexKeys(c context.Context, g geo.Geography) ([]Key, error)
// it to the index. Additionally returns a bounding box, which is non-empty
// iff the key slice is non-empty.
InvertedIndexKeys(c context.Context, g geo.Geography) ([]Key, geopb.BoundingBox, error)

// Acceleration for topological relationships (see
// https://postgis.net/docs/reference.html#Spatial_Relationships). Distance
Expand Down Expand Up @@ -153,8 +155,9 @@ type GeographyIndex interface {
// GeometryIndex is an index over 2D cartesian coordinates.
type GeometryIndex interface {
// InvertedIndexKeys returns the keys to store this object under when adding
// it to the index.
InvertedIndexKeys(c context.Context, g geo.Geometry) ([]Key, error)
// it to the index. Additionally returns a bounding box, which is non-empty
// iff the key slice is non-empty.
InvertedIndexKeys(c context.Context, g geo.Geometry) ([]Key, geopb.BoundingBox, error)

// Acceleration for topological relationships (see
// https://postgis.net/docs/reference.html#Spatial_Relationships). Distance
Expand Down Expand Up @@ -218,9 +221,11 @@ const (
)

var geoRelationshipTypeStr = map[RelationshipType]string{
Covers: "covers",
CoveredBy: "covered by",
Intersects: "intersects",
Covers: "covers",
CoveredBy: "covered by",
Intersects: "intersects",
DWithin: "dwithin",
DFullyWithin: "dfullywithin",
}

func (gr RelationshipType) String() string {
Expand Down
16 changes: 13 additions & 3 deletions pkg/geo/geoindex/s2_geography_index.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (

"github.com/cockroachdb/cockroach/pkg/geo"
"github.com/cockroachdb/cockroach/pkg/geo/geogfn"
"github.com/cockroachdb/cockroach/pkg/geo/geopb"
"github.com/cockroachdb/cockroach/pkg/geo/geoprojbase"
"github.com/cockroachdb/errors"
"github.com/golang/geo/s1"
Expand Down Expand Up @@ -108,12 +109,21 @@ func isBadGeogCovering(cu s2.CellUnion) bool {
}

// InvertedIndexKeys implements the GeographyIndex interface.
func (i *s2GeographyIndex) InvertedIndexKeys(c context.Context, g geo.Geography) ([]Key, error) {
func (i *s2GeographyIndex) InvertedIndexKeys(
c context.Context, g geo.Geography,
) ([]Key, geopb.BoundingBox, error) {
r, err := g.AsS2(geo.EmptyBehaviorOmit)
if err != nil {
return nil, err
return nil, geopb.BoundingBox{}, err
}
rect := g.BoundingRect()
bbox := geopb.BoundingBox{
LoX: rect.Lng.Lo,
HiX: rect.Lng.Hi,
LoY: rect.Lat.Lo,
HiY: rect.Lat.Hi,
}
return invertedIndexKeys(c, geogCovererWithBBoxFallback{rc: i.rc, g: g}, r), nil
return invertedIndexKeys(c, geogCovererWithBBoxFallback{rc: i.rc, g: g}, r), bbox, nil
}

// Covers implements the GeographyIndex interface.
Expand Down
16 changes: 13 additions & 3 deletions pkg/geo/geoindex/s2_geometry_index.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,14 +172,16 @@ func isBadGeomCovering(cu s2.CellUnion) bool {
}

// InvertedIndexKeys implements the GeometryIndex interface.
func (s *s2GeometryIndex) InvertedIndexKeys(c context.Context, g geo.Geometry) ([]Key, error) {
func (s *s2GeometryIndex) InvertedIndexKeys(
c context.Context, g geo.Geometry,
) ([]Key, geopb.BoundingBox, error) {
// If the geometry exceeds the bounds, we index the clipped geometry in
// addition to the special cell, so that queries for geometries that don't
// exceed the bounds don't need to query the special cell (which would
// become a hotspot in the key space).
gt, clipped, err := s.convertToGeomTAndTryClip(g)
if err != nil {
return nil, err
return nil, geopb.BoundingBox{}, err
}
var keys []Key
if gt != nil {
Expand All @@ -189,7 +191,15 @@ func (s *s2GeometryIndex) InvertedIndexKeys(c context.Context, g geo.Geometry) (
if clipped {
keys = append(keys, Key(exceedsBoundsCellID))
}
return keys, nil
bbox := geopb.BoundingBox{}
bboxRef := g.BoundingBoxRef()
if bboxRef == nil && len(keys) > 0 {
return keys, bbox, errors.AssertionFailedf("non-empty geometry should have bounding box")
}
if bboxRef != nil {
bbox = *bboxRef
}
return keys, bbox, nil
}

// Covers implements the GeometryIndex interface.
Expand Down
2 changes: 1 addition & 1 deletion pkg/geo/geoindex/s2_geometry_index_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ func TestNoClippingAtSRIDBounds(t *testing.T) {
for i := range xCorners {
g, err := geo.MakeGeometryFromPointCoords(xCorners[i], yCorners[i])
require.NoError(t, err)
keys, err := index.InvertedIndexKeys(context.Background(), g)
keys, _, err := index.InvertedIndexKeys(context.Background(), g)
require.NoError(t, err)
require.Equal(t, 1, len(keys))
require.NotEqual(t, Key(exceedsBoundsCellID), keys[0],
Expand Down
13 changes: 13 additions & 0 deletions pkg/geo/geoindex/testdata/s2_geography
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,21 @@ MULTIPOLYGON(((-74.0126927094665 40.6600480732464,-74.0127064547618 40.660044746
init minlevel=0 maxlevel=30 maxcells=2
----

# NB: Bounding boxes use radian units for longitude and latitude.
index-keys name=nys-approx1
----
F4/L5/10321, F4/L5/10322
BoundingBox: lo_x:-1.3788101090755203 hi_x:-1.3439035240356338 lo_y:0.7155849933176747 hi_y:0.7679448708775054

index-keys name=nys-approx2
----
F4/L5/10321, F4/L5/10322
BoundingBox: lo_x:-1.3788101090755203 hi_x:-1.3439035240356338 lo_y:0.7155849933176747 hi_y:0.7679448708775054

index-keys name=point1
----
F0/L30/200023210133112000102232313101, F0/L30/200033223021330130001101002333
BoundingBox: lo_x:0.017453292519943295 hi_x:0.05235987755982989 lo_y:0.06981317007977318 hi_y:0.08726646259971647

covers name=point1
----
Expand Down Expand Up @@ -115,6 +119,7 @@ F0/L2/20, F0/L1/2, F0/L0/
index-keys name=shortline1
----
F4/L19/1010131200020223230, F4/L21/101013120002022323312
BoundingBox: lo_x:-1.5116288797541255 hi_x:-1.5116268028123157 lo_y:0.5707865269994759 hi_y:0.5707869284252046

covers name=shortline1
----
Expand Down Expand Up @@ -168,6 +173,7 @@ F4/L2/10, F4/L1/1, F4/L0/
index-keys name=nystate
----
F2/L2/12, F4/L2/10
BoundingBox: lo_x:-1.392116499292725 hi_x:-1.2523034089032152 lo_y:0.7064604119882483 hi_y:0.7856651987730039

covers name=nystate
----
Expand All @@ -184,24 +190,28 @@ F4/L1/1, F4/L0/
index-keys name=red-hook-nyc
----
F4/L13/1032010231102, F4/L13/1032010231113
BoundingBox: lo_x:-1.2917981498721696 hi_x:-1.2917367221517457 lo_y:0.7096516548356566 hi_y:0.7096900164191924

init minlevel=0 maxlevel=30 maxcells=8
----

index-keys name=nystate
----
F2/L5/12112, F2/L5/12121, F2/L5/12122, F2/L7/1221111, F4/L4/1001, F4/L4/1032, F4/L5/10330, F4/L5/10331
BoundingBox: lo_x:-1.392116499292725 hi_x:-1.2523034089032152 lo_y:0.7064604119882483 hi_y:0.7856651987730039

index-keys name=red-hook-nyc
----
F4/L16/1032010231102232, F4/L16/1032010231102233, F4/L15/103201023110230, F4/L17/10320102311132301, F4/L18/103201023111323020, F4/L16/1032010231113233, F4/L17/10320102311133000, F4/L17/10320102311133003
BoundingBox: lo_x:-1.2917981498721696 hi_x:-1.2917367221517457 lo_y:0.7096516548356566 hi_y:0.7096900164191924

init minlevel=0 maxlevel=30 maxcells=1
----

index-keys name=point1
----
F0/L30/200023210133112000102232313101, F0/L30/200033223021330130001101002333
BoundingBox: lo_x:0.017453292519943295 hi_x:0.05235987755982989 lo_y:0.06981317007977318 hi_y:0.08726646259971647

covers name=point1
----
Expand Down Expand Up @@ -281,6 +291,7 @@ F0/L2/20, F0/L1/2, F0/L0/
index-keys name=shortline1
----
F4/L18/101013120002022323
BoundingBox: lo_x:-1.5116288797541255 hi_x:-1.5116268028123157 lo_y:0.5707865269994759 hi_y:0.5707869284252046

covers name=shortline1
----
Expand Down Expand Up @@ -344,6 +355,7 @@ init minlevel=0 maxlevel=16 maxcells=2
index-keys name=point1
----
F0/L16/2000232101331120, F0/L16/2000332230213301
BoundingBox: lo_x:0.017453292519943295 hi_x:0.05235987755982989 lo_y:0.06981317007977318 hi_y:0.08726646259971647

d-within distance=2000 name=point1
----
Expand Down Expand Up @@ -383,6 +395,7 @@ F0/L2/20, F0/L1/2
index-keys name=nys-approx1
----
F4/L5/10321, F4/L5/10322
BoundingBox: lo_x:-1.3788101090755203 hi_x:-1.3439035240356338 lo_y:0.7155849933176747 hi_y:0.7679448708775054

intersects name=nys-approx1
----
Expand Down
8 changes: 8 additions & 0 deletions pkg/geo/geoindex/testdata/s2_geometry
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,19 @@ init minlevel=0 maxlevel=30 maxcells=4 minx=10 miny=10 maxx=20 maxy=20
index-keys name=poly1
----
F0/L30/022222222222222222222222222222, F0/L30/133333333333333333333333333333, F0/L2/20, F0/L30/311111111111111111111111111111
BoundingBox: lo_x:15 hi_x:16 lo_y:15 hi_y:17

index-keys name=poly1-different-orientation
----
F0/L30/022222222222222222222222222222, F0/L30/133333333333333333333333333333, F0/L2/20, F0/L30/311111111111111111111111111111
BoundingBox: lo_x:15 hi_x:16 lo_y:15 hi_y:17

# Point exceeds the defined bounds.

index-keys name=point1
----
F0/L30/002200220022002200220022002200, spilled
BoundingBox: lo_x:12 hi_x:21 lo_y:12 hi_y:21

covers name=point1
----
Expand Down Expand Up @@ -90,6 +93,7 @@ F0/L2/00, F0/L1/0, F0/L0/
index-keys name=shortline1
----
F0/L5/00022, F0/L3/002, F0/L30/003111111111111111111111111111
BoundingBox: lo_x:11 hi_x:12 lo_y:11 hi_y:12

covers name=shortline1
----
Expand Down Expand Up @@ -155,6 +159,7 @@ F0/L2/00, F0/L1/0, F0/L0/
index-keys name=line-exceeds-bound
----
F0/L30/030033003300330033003300330033, F0/L1/1, F0/L2/21, F0/L4/2211, spilled
BoundingBox: lo_x:12 hi_x:22 lo_y:15 hi_y:23

covers name=line-exceeds-bound
----
Expand Down Expand Up @@ -256,6 +261,7 @@ init minlevel=0 maxlevel=4 maxcells=2 minx=10 miny=10 maxx=20 maxy=20
index-keys name=point1
----
F0/L4/0022, spilled
BoundingBox: lo_x:12 hi_x:21 lo_y:12 hi_y:21

covers name=point1
----
Expand Down Expand Up @@ -354,7 +360,9 @@ MULTIPOLYGON (((592585.8247029320336878299713 4527600.0504941008985042572021, 59
index-keys name=troubled-poly1
----
F0/L17/02213031322200022, F0/L18/022130313222001333, F0/L16/0221303132220020, F0/L16/0221303132220031
BoundingBox: lo_x:586773.9053534917 hi_x:586890.2225210913 lo_y:4.500142974460221e+06 hi_y:4.500270330213479e+06

index-keys name=troubled-poly2
----
F0/L16/0221303121021212, F0/L16/0221303121021221, F0/L16/0221303121312112, F0/L16/0221303121312121
BoundingBox: lo_x:592324.7216157623 hi_x:592592.2261793566 lo_y:4.527481727534395e+06 hi_y:4.527693397730719e+06
8 changes: 6 additions & 2 deletions pkg/geo/geoindex/utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"strings"
"testing"

"github.com/cockroachdb/cockroach/pkg/geo/geopb"
"github.com/cockroachdb/datadriven"
"github.com/golang/geo/s2"
)
Expand All @@ -39,15 +40,18 @@ func s2Config(t *testing.T, d *datadriven.TestData) S2Config {
}
}

func keysToString(keys []Key, err error) string {
func keysToString(keys []Key, bbox geopb.BoundingBox, err error) string {
if err != nil {
return err.Error()
}
if len(keys) == 0 {
return ""
}
var cells []string
for _, k := range keys {
cells = append(cells, k.String())
}
return strings.Join(cells, ", ")
return fmt.Sprintf("%s\nBoundingBox: %s", strings.Join(cells, ", "), bbox.String())
}

func cellUnionToString(cells s2.CellUnion) string {
Expand Down
16 changes: 8 additions & 8 deletions pkg/sql/logictest/testdata/logic_test/distsql_stats
Original file line number Diff line number Diff line change
Expand Up @@ -925,9 +925,9 @@ WHERE statistics_name = 's' AND column_names = '{geog}'
query TIRI colnames
SHOW HISTOGRAM $hist_id_1
----
upper_bound range_rows distinct_range_rows equal_rows
'\xfd1000000000000000' 0 0 1
'\xfd5000000000000000' 0 0 1
upper_bound range_rows distinct_range_rows equal_rows
'\x42fd1000000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1
'\x42fd5000000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1

statement ok
DROP INDEX geo_table@geog_idx_1;
Expand All @@ -941,8 +941,8 @@ WHERE statistics_name = 's' AND column_names = '{geog}'
query TIRI colnames
SHOW HISTOGRAM $hist_id_1
----
upper_bound range_rows distinct_range_rows equal_rows
'\xfd1000000000000000' 0 0 1
'\xfd4500000000000000' 0 0 1
'\xfd4700000000000000' 0 0 1
'\xfd5ad4000000000000' 0 0 1
upper_bound range_rows distinct_range_rows equal_rows
'\x42fd1000000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1
'\x42fd4500000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1
'\x42fd4700000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1
'\x42fd5ad4000000000000000000000000000000bcc00000000000003ffbecde5da115a83ff661bdc396bcdc' 0 0 1
10 changes: 5 additions & 5 deletions pkg/sql/logictest/testdata/logic_test/inv_stats
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ SELECT histogram_id FROM [SHOW STATISTICS FOR TABLE t] WHERE statistics_name = '
query TIRI colnames
SHOW HISTOGRAM $hist_id_1
----
upper_bound range_rows distinct_range_rows equal_rows
'\xfd0555555555555555' 0 0 1
'\xfd0fffffff00000000' 0 0 1
'\xfd1000000100000000' 0 0 1
'\xfd1aaaaaab00000000' 0 0 1
upper_bound range_rows distinct_range_rows equal_rows
'\x42fd055555555555555500000000000000000000000000000000003ff00000000000003ff0000000000000' 0 0 1
'\x42fd0fffffff0000000000000000000000000000000000000000003ff00000000000003ff0000000000000' 0 0 1
'\x42fd100000010000000000000000000000000000000000000000003ff00000000000003ff0000000000000' 0 0 1
'\x42fd1aaaaaab0000000000000000000000000000000000000000003ff00000000000003ff0000000000000' 0 0 1

# Regression for #50596.
statement ok
Expand Down
Loading

0 comments on commit dd8b81c

Please sign in to comment.