Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support INVERTED INDEX range scans #24960

Open
Tracked by #59331
danhhz opened this issue Apr 20, 2018 · 11 comments
Open
Tracked by #59331

sql: support INVERTED INDEX range scans #24960

danhhz opened this issue Apr 20, 2018 · 11 comments
Labels
A-sql-json JSON handling in SQL. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@danhhz
Copy link
Contributor

danhhz commented Apr 20, 2018

From the forum: https://forum.cockroachlabs.com/t/multi-tenant-custom-fields-saas-app/1565/2

So, let’s say all tenants’ products are in one table and there’s foreign key tenant_id. Then we have a json field, custom_data. A tenant might have a custom field price. Then the tenant wants to search all his products where price > 100. An index on the foreign key, possibly compound with other “static” fields will speed up the query. But an index on json field will not be useful in this case, right?

Jira issue: CRDB-5739

@danhhz
Copy link
Contributor Author

danhhz commented Apr 20, 2018

Assigning to @awoods187 for prioritization

@jordanlewis jordanlewis added A-sql-json JSON handling in SQL. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 24, 2018
@knz knz added the A-sql-optimizer SQL logical planning and optimizations. label Apr 28, 2018
@awoods187
Copy link
Contributor

@RaduBerinde is this related to the computed index ideas we've been discussing?

@awoods187 awoods187 removed their assignment Mar 25, 2020
@RaduBerinde
Copy link
Member

@RaduBerinde is this related to the computed index ideas we've been discussing?

I think so, we could have an index on a computed value that extracts the json field.

@jordanlewis
Copy link
Member

A similar use case is for prefix matches on inverted index string columns:

create table a (id int primary key, s string[], inverted index(s));
insert into a values(1, array['big', 'indexed', 'sentence']);

Today, you can do exact matches:

select * from a where s @> array['blah']

With some way to do range scans, you could do something like (syntax doesn't work, but this is the intent):

select * from a where any(s) like 'prefix%'

@lopezator
Copy link
Contributor

lopezator commented Sep 24, 2020

We also got bitten by this, a filter over an JSON column with an inverted index uses it, but if you add another field to the game, a PK for example, it actually ignores the inverted index.

Thanks for raising this @danhhz

DROP table foo;
CREATE TABLE foo (A INT PRIMARY KEY, B jsonb, C VARCHAR);
INSERT INTO foo (A, B, C) SELECT generate_series(1,100) AS A, '{"values": ["foo", "bar", "baz"]}' AS B, md5(random()::text) AS C;
CREATE INVERTED INDEX foo_inv ON foo(B);
CREATE INDEX foo_idx ON foo(C);
EXPLAIN SELECT * FROM foo WHERE B @> '{"values": ["baz"]}';

Looks good:

[
  {
    "tree": "",
    "field": "distributed",
    "description": "false"
  },
  {
    "tree": "",
    "field": "vectorized",
    "description": "false"
  },
  {
    "tree": "index-join",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@primary"
  },
  {
    "tree": "",
    "field": "key columns",
    "description": "a"
  },
  {
    "tree": " └── scan",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@foo_inv"
  },
  {
    "tree": "",
    "field": "spans",
    "description": "/\"values\"/Arr/\"baz\"-/\"values\"/Arr/\"baz\"/PrefixEnd"
  }
]

But:

EXPLAIN SELECT * FROM foo WHERE  B @> '{"values": ["baz"]}' AND C = 'someAutogenID';

Doesn't:

[
  {
    "tree": "",
    "field": "distributed",
    "description": "false"
  },
  {
    "tree": "",
    "field": "vectorized",
    "description": "false"
  },
  {
    "tree": "filter",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "filter",
    "description": "b @> '{\"values\": [\"baz\"]}'"
  },
  {
    "tree": " └── index-join",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@primary"
  },
  {
    "tree": "",
    "field": "key columns",
    "description": "a"
  },
  {
    "tree": "      └── scan",
    "field": "",
    "description": ""
  },
  {
    "tree": "",
    "field": "table",
    "description": "foo@foo_idx"
  },
  {
    "tree": "",
    "field": "spans",
    "description": "/\"78455d02293f0f16ab5e519c244a70dc\"-/\"78455d02293f0f16ab5e519c244a70dc\"/PrefixEnd"
  }
]

@RaduBerinde
Copy link
Member

@lopezator - in the second case, it's much better to use the primary index since we scan at most one row (for a=3). Using the inverted index would be worse in most cases.

@lopezator
Copy link
Contributor

@RaduBerinde you are right, bad example. I've update the example above to be more clear.

@RaduBerinde
Copy link
Member

CC @mgartner @rytaft

@mgartner
Copy link
Collaborator

@lopezator Your updated example is scanning foo_idx, which is the better query plan assuming that the filters on C is more selective than the filter on B. Currently, stats for JSON columns are not as precise as stats for other data types, so it's possible that a scan on foo_idx would be preferred even if the filter on B was more selective.

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@Bessonov
Copy link

a comment will keep it active

@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Sep 27, 2023
@mgartner mgartner moved this from Triage to New Backlog in SQL Queries Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-json JSON handling in SQL. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

9 participants