Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.2: geoindex,invertedidx,rowexec,encoding,...: add bounding box to invert… #54128

Merged

Conversation

sumeerbhola
Copy link
Collaborator

Backport 1/1 commits from #53963.

/cc @cockroachdb/release


…ed column

The bounding box is encoded after the cellid, and used to perform
"pre-filtering", i.e., filtering after the inverted row is fetched
but before it is added to various set expressions.

The fundamental weakness with the cell covering
approach is that the coverings for convex shapes are typically
worse than a bounding box and we pay significant cost in doing
a lookup join to eliminate the false positives. This attempts
to reduce those false positives for the lookup join, and part
of the inverted join (the invertedJoiner can avoid adding a
row to the RowContainer).

For the canonical nyc st_intersects and st_dwithin queries
the joinReader would see a much higher input row count than
the output due to false positives. Specifically,

  • st_intersects: input 3110, output 490
  • st_dwithin: input 23183, output 8606
    With this change the input counts are 1064 and 11177. There
    is the additional cost of reading extra bytes and applying
    the bounding box filter, but it is more than offset by the
    gains (see comparison numbers below).

This is a format change for the index, so is easier to do
before the launch of spatial features.

The encoding here is trivial, and consumes 17 bytes for points
and 33 bytes for others. An experiment with a more complicated
encoding that converted to integers while bounding the
relative error did not show any improvement wrt the following
benchmarks.

There are TODOs for followup PRs:

  • Do such pre-filtering for invertedFilterer. This will need
    a spec change.
  • Support pre-filtering for expressions containing more than
    one function.

Comparison numbers:

old is master

name                                   old ms     new ms     delta
Test/nyc/nyc-intersects                80.0 ±23%  69.0 ±30%  -13.76%  (p=0.000 n=49+48)
Test/nyc/nyc-dwithin                    225 ±13%   169 ± 8%  -24.88%  (p=0.000 n=46+45)
Test/nyc/nyc-coveredby                 84.1 ±11%  69.0 ±14%  -17.96%  (p=0.000 n=50+49)
Test/nyc/nyc-expand-blocks             85.7 ± 6%  84.6 ± 8%   -1.22%  (p=0.030 n=45+47)
Test/nyc/nyc-dwithin-and-dfullywithin   195 ± 7%   195 ± 6%     ~     (p=1.000 n=46+49)
Test/postgis_geometry_tutorial/11.2b   1.66 ±24%  1.53 ±45%   -7.87%  (p=0.006 n=50+50)
Test/postgis_geometry_tutorial/11.6    1.61 ±19%  1.66 ±32%     ~     (p=0.329 n=46+50)
Test/postgis_geometry_tutorial/12.1.2  0.89 ±26%  0.91 ±26%     ~     (p=0.532 n=46+50)
Test/postgis_geometry_tutorial/12.2.3  1.33 ±20%  1.38 ±36%     ~     (p=0.308 n=49+49)
Test/postgis_geometry_tutorial/12.2.4  1.30 ±27%  1.29 ±18%     ~     (p=0.746 n=49+45)
Test/postgis_geometry_tutorial/13.0    1.49 ±36%  1.55 ±33%     ~     (p=0.141 n=49+50)
Test/postgis_geometry_tutorial/13.1a    233 ± 7%   197 ± 7%  -15.45%  (p=0.000 n=48+47)
Test/postgis_geometry_tutorial/13.1c   24.9 ±15%  19.4 ±18%  -22.17%  (p=0.000 n=50+48)
Test/postgis_geometry_tutorial/13.2     309 ± 4%   257 ± 7%  -16.92%  (p=0.000 n=43+47)
Test/postgis_geometry_tutorial/14.1a   1.51 ±15%  1.70 ±16%  +12.93%  (p=0.000 n=48+44)
Test/postgis_geometry_tutorial/14.2b   6.28 ±22%  6.07 ±19%     ~     (p=0.050 n=50+45)
Test/postgis_geometry_tutorial/14.2c   3.70 ±14%  3.47 ±10%   -5.98%  (p=0.000 n=48+46)
Test/postgis_geometry_tutorial/14.3c   31.9 ±11%  25.9 ±12%  -18.60%  (p=0.000 n=48+49)
Test/postgis_geometry_tutorial/15.0    1.50 ±20%  1.50 ±20%     ~     (p=0.860 n=48+49)

Release justification: This is an incompatible format change,
and discussions with engineering and product indicated a
preference to do this now before we launch spatial features

Release note: None

…ed column

The bounding box is encoded after the cellid, and used to perform
"pre-filtering", i.e., filtering after the inverted row is fetched
but before it is added to various set expressions.

The fundamental weakness with the cell covering
approach is that the coverings for convex shapes are typically
worse than a bounding box and we pay significant cost in doing
a lookup join to eliminate the false positives. This attempts
to reduce those false positives for the lookup join, and part
of the inverted join (the invertedJoiner can avoid adding a
row to the RowContainer).

For the canonical nyc st_intersects and st_dwithin queries
the joinReader would see a much higher input row count than
the output due to false positives. Specifically,
- st_intersects: input 3110, output 490
- st_dwithin: input 23183, output 8606

With this change the input counts are 1064 and 11177. There
is the additional cost of reading extra bytes and applying
the bounding box filter, but it is more than offset by the
gains (see comparison numbers below).

This is a format change for the index, so is easier to do
before the launch of spatial features.

The encoding here is trivial, and consumes 17 bytes for points
and 33 bytes for others. An experiment with a more complicated
encoding that converted to integers while bounding the
relative error did not show any improvement wrt the following
benchmarks.

There are TODOs for followup PRs:
- Do such pre-filtering for invertedFilterer. This will need
  a spec change.
- Support pre-filtering for expressions containing more than
  one function.

Comparison numbers:

old is master
name                                   old ms     new ms     delta
Test/nyc/nyc-intersects                80.0 ±23%  69.0 ±30%  -13.76%  (p=0.000 n=49+48)
Test/nyc/nyc-dwithin                    225 ±13%   169 ± 8%  -24.88%  (p=0.000 n=46+45)
Test/nyc/nyc-coveredby                 84.1 ±11%  69.0 ±14%  -17.96%  (p=0.000 n=50+49)
Test/nyc/nyc-expand-blocks             85.7 ± 6%  84.6 ± 8%   -1.22%  (p=0.030 n=45+47)
Test/nyc/nyc-dwithin-and-dfullywithin   195 ± 7%   195 ± 6%     ~     (p=1.000 n=46+49)
Test/postgis_geometry_tutorial/11.2b   1.66 ±24%  1.53 ±45%   -7.87%  (p=0.006 n=50+50)
Test/postgis_geometry_tutorial/11.6    1.61 ±19%  1.66 ±32%     ~     (p=0.329 n=46+50)
Test/postgis_geometry_tutorial/12.1.2  0.89 ±26%  0.91 ±26%     ~     (p=0.532 n=46+50)
Test/postgis_geometry_tutorial/12.2.3  1.33 ±20%  1.38 ±36%     ~     (p=0.308 n=49+49)
Test/postgis_geometry_tutorial/12.2.4  1.30 ±27%  1.29 ±18%     ~     (p=0.746 n=49+45)
Test/postgis_geometry_tutorial/13.0    1.49 ±36%  1.55 ±33%     ~     (p=0.141 n=49+50)
Test/postgis_geometry_tutorial/13.1a    233 ± 7%   197 ± 7%  -15.45%  (p=0.000 n=48+47)
Test/postgis_geometry_tutorial/13.1c   24.9 ±15%  19.4 ±18%  -22.17%  (p=0.000 n=50+48)
Test/postgis_geometry_tutorial/13.2     309 ± 4%   257 ± 7%  -16.92%  (p=0.000 n=43+47)
Test/postgis_geometry_tutorial/14.1a   1.51 ±15%  1.70 ±16%  +12.93%  (p=0.000 n=48+44)
Test/postgis_geometry_tutorial/14.2b   6.28 ±22%  6.07 ±19%     ~     (p=0.050 n=50+45)
Test/postgis_geometry_tutorial/14.2c   3.70 ±14%  3.47 ±10%   -5.98%  (p=0.000 n=48+46)
Test/postgis_geometry_tutorial/14.3c   31.9 ±11%  25.9 ±12%  -18.60%  (p=0.000 n=48+49)
Test/postgis_geometry_tutorial/15.0    1.50 ±20%  1.50 ±20%     ~     (p=0.860 n=48+49)

Release justification: This is an incompatible format change,
and discussions with engineering and product indicated a
preference to do this now before we launch spatial features

Release note: None
@sumeerbhola sumeerbhola requested review from rytaft and otan September 9, 2020 15:40
@sumeerbhola sumeerbhola requested a review from a team as a code owner September 9, 2020 15:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Should you add this here: #53662?

Reviewed 34 of 34 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan)

@sumeerbhola
Copy link
Collaborator Author

Added to #53662

@sumeerbhola
Copy link
Collaborator Author

I'm assuming the check mark in #53662 means I can merge this.

@sumeerbhola sumeerbhola merged commit 7d5e070 into cockroachdb:release-20.2 Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants