sql: support INVERTED INDEX range scans #24960

danhhz · 2018-04-20T18:36:20Z

From the forum: https://forum.cockroachlabs.com/t/multi-tenant-custom-fields-saas-app/1565/2

So, let’s say all tenants’ products are in one table and there’s foreign key tenant_id. Then we have a json field, custom_data. A tenant might have a custom field price. Then the tenant wants to search all his products where price > 100. An index on the foreign key, possibly compound with other “static” fields will speed up the query. But an index on json field will not be useful in this case, right?

Jira issue: CRDB-5739

danhhz · 2018-04-20T18:36:35Z

Assigning to @awoods187 for prioritization

awoods187 · 2020-03-25T14:48:27Z

@RaduBerinde is this related to the computed index ideas we've been discussing?

RaduBerinde · 2020-03-25T15:16:50Z

@RaduBerinde is this related to the computed index ideas we've been discussing?

I think so, we could have an index on a computed value that extracts the json field.

jordanlewis · 2020-09-11T17:26:42Z

A similar use case is for prefix matches on inverted index string columns:

create table a (id int primary key, s string[], inverted index(s));
insert into a values(1, array['big', 'indexed', 'sentence']);

Today, you can do exact matches:

select * from a where s @> array['blah']

With some way to do range scans, you could do something like (syntax doesn't work, but this is the intent):

select * from a where any(s) like 'prefix%'

lopezator · 2020-09-24T09:55:55Z

We also got bitten by this, a filter over an JSON column with an inverted index uses it, but if you add another field to the game, a PK for example, it actually ignores the inverted index.

Thanks for raising this @danhhz

DROP table foo;
CREATE TABLE foo (A INT PRIMARY KEY, B jsonb, C VARCHAR);
INSERT INTO foo (A, B, C) SELECT generate_series(1,100) AS A, '{"values": ["foo", "bar", "baz"]}' AS B, md5(random()::text) AS C;
CREATE INVERTED INDEX foo_inv ON foo(B);
CREATE INDEX foo_idx ON foo(C);

EXPLAIN SELECT * FROM foo WHERE B @> '{"values": ["baz"]}';

Looks good:

[
  {
    "tree": "",
    "field": "distributed",
    "description": "false"
  },
  {
    "tree": "",
    "field": "vectorized",
    "description": "false"
  },
  {
    "tree": "index-join",
    "field": "",
    "description": ""
  },
  {
    "tree": " │",
    "field": "table",
    "description": "foo@primary"
  },
  {
    "tree": " │",
    "field": "key columns",
    "description": "a"
  },
  {
    "tree": " └── scan",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@foo_inv"
  },
  {
    "tree": "",
    "field": "spans",
    "description": "/\"values\"/Arr/\"baz\"-/\"values\"/Arr/\"baz\"/PrefixEnd"
  }
]

But:

EXPLAIN SELECT * FROM foo WHERE  B @> '{"values": ["baz"]}' AND C = 'someAutogenID';

Doesn't:

[
  {
    "tree": "",
    "field": "distributed",
    "description": "false"
  },
  {
    "tree": "",
    "field": "vectorized",
    "description": "false"
  },
  {
    "tree": "filter",
    "field": "",
    "description": ""
  },
  {
    "tree": " │",
    "field": "filter",
    "description": "b @> '{\"values\": [\"baz\"]}'"
  },
  {
    "tree": " └── index-join",
    "field": "",
    "description": ""
  },
  {
    "tree": "      │",
    "field": "table",
    "description": "foo@primary"
  },
  {
    "tree": "      │",
    "field": "key columns",
    "description": "a"
  },
  {
    "tree": "      └── scan",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@foo_idx"
  },
  {
    "tree": "",
    "field": "spans",
    "description": "/\"78455d02293f0f16ab5e519c244a70dc\"-/\"78455d02293f0f16ab5e519c244a70dc\"/PrefixEnd"
  }
]

RaduBerinde · 2020-09-24T16:57:49Z

@lopezator - in the second case, it's much better to use the primary index since we scan at most one row (for a=3). Using the inverted index would be worse in most cases.

lopezator · 2020-09-25T07:34:55Z

@RaduBerinde you are right, bad example. I've update the example above to be more clear.

RaduBerinde · 2020-12-11T19:08:39Z

CC @mgartner @rytaft

mgartner · 2020-12-14T21:30:40Z

@lopezator Your updated example is scanning foo_idx, which is the better query plan assuming that the filters on C is more selective than the filter on B. Currently, stats for JSON columns are not as precise as stats for other data types, so it's possible that a scan on foo_idx would be preferred even if the filter on B was more selective.

github-actions · 2023-09-26T11:08:15Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Bessonov · 2023-09-26T12:23:22Z

a comment will keep it active

danhhz assigned awoods187 Apr 20, 2018

jordanlewis added A-sql-json JSON handling in SQL. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 24, 2018

knz added the A-sql-optimizer SQL logical planning and optimizations. label Apr 28, 2018

awoods187 removed their assignment Mar 25, 2020

isoos mentioned this issue Sep 22, 2020

sql: add support for comparison operators in inverted indexes #35154

Open

mgartner mentioned this issue Jan 23, 2021

opt: inverted index improvements #59331

Open

31 tasks

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

github-actions bot added the no-issue-activity label Sep 26, 2023

github-actions bot removed the no-issue-activity label Sep 27, 2023

mgartner added this to SQL Queries Sep 27, 2023

github-project-automation bot moved this to Triage in SQL Queries Sep 27, 2023

mgartner moved this from Triage to New Backlog in SQL Queries Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support INVERTED INDEX range scans #24960

sql: support INVERTED INDEX range scans #24960

danhhz commented Apr 20, 2018 •

edited by cockroach-jira-scripts

Loading

danhhz commented Apr 20, 2018

awoods187 commented Mar 25, 2020

RaduBerinde commented Mar 25, 2020

jordanlewis commented Sep 11, 2020

lopezator commented Sep 24, 2020 •

edited

Loading

RaduBerinde commented Sep 24, 2020

lopezator commented Sep 25, 2020

RaduBerinde commented Dec 11, 2020

mgartner commented Dec 14, 2020

github-actions bot commented Sep 26, 2023

Bessonov commented Sep 26, 2023

sql: support INVERTED INDEX range scans #24960

sql: support INVERTED INDEX range scans #24960

Comments

danhhz commented Apr 20, 2018 • edited by cockroach-jira-scripts Loading

danhhz commented Apr 20, 2018

awoods187 commented Mar 25, 2020

RaduBerinde commented Mar 25, 2020

jordanlewis commented Sep 11, 2020

lopezator commented Sep 24, 2020 • edited Loading

RaduBerinde commented Sep 24, 2020

lopezator commented Sep 25, 2020

RaduBerinde commented Dec 11, 2020

mgartner commented Dec 14, 2020

github-actions bot commented Sep 26, 2023

Bessonov commented Sep 26, 2023

danhhz commented Apr 20, 2018 •

edited by cockroach-jira-scripts

Loading

lopezator commented Sep 24, 2020 •

edited

Loading