Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: composable inverted indexes #109302

Closed
michae2 opened this issue Aug 23, 2023 · 3 comments
Closed

sql: composable inverted indexes #109302

michae2 opened this issue Aug 23, 2023 · 3 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@michae2
Copy link
Collaborator

michae2 commented Aug 23, 2023

(All credit goes to @lin-crl for this idea.)

Suppose we had a table with a JSON column, and we wanted to quickly find all the rows with a certain path equal to "abc". The path might vary depending on the query. We can do that today with an inverted index:

CREATE TABLE t (k int PRIMARY KEY, j jsonb, INVERTED INDEX (j));
SELECT k FROM t WHERE j -> 'x' -> 'y' -> 'z' = '"abc"';
SELECT k FROM t WHERE j -> 'm' -> 'n' = '"abc"';

The plan looks like:

  project
   └── scan t@t_j_idx
        └── inverted constraint: /5/1
             └── spans: ["7x\x00\x02y\x00\x02z\x00\x01\x12abc\x00\x01", "7x\x00\x02y\x00\x02z\x00\x01\x12abc\x00\x01"]

But now suppose we want to find all the rows with a certain path LIKE "%abc%". Normally we would use a trigram inverted index for this. We could create a trigram inverted index on the entire JSON column cast to a string:

CREATE INVERTED INDEX ON t ((j::string) gin_trgm_ops);
SELECT k FROM t WHERE j::string LIKE '%abc%';

That plan looks like:

  project
   └── select
        ├── index-join t
        │    └── scan t@t_expr_idx
        │         └── inverted constraint: /7/1
        │              └── spans: ["\x12abc\x00\x01", "\x12abc\x00\x01"]
        └── filters
             └── j::STRING LIKE '%abc%'

But that plan won't necessarily guarantee that we're matching against one specific path in the JSON column. To do this we need to add back the predicate as an additional filter:

SELECT k FROM t WHERE j::string LIKE '%abc%' AND jsonb_extract_path_text(j, 'x', 'y', 'z') LIKE '%abc%';

This means we'll have to perform the LIKE operation multiple times on a single row to confirm a match, and we might be searching many more rows than we need to. Here's the plan:

  project
   └── select
        ├── index-join t
        │    └── scan t@t_expr_idx
        │         └── inverted constraint: /7/1
        │              └── spans: ["\x12abc\x00\x01", "\x12abc\x00\x01"]
        └── filters
             ├── j::STRING LIKE '%abc%'
             └── jsonb_extract_path_text(j, 'x', 'y', 'z') LIKE '%abc%'

Instead of using one inverted index plus a filter, it would be nice if we could combine the power of the two inverted indexes to only perform a trigram search on the contents of the path we want. In other words, if we could compose the two inversions. Something like this:

CREATE INVERTED INDEX ON t (j, (j_inverted_key::string) gin_trgm_ops);

Where j_inverted_key is the name of the inverted JSON value. So for each row of the table there would be multiple rows (for each JSON path) and then for each of these there would be multiple rows (for each trigram).

Jira issue: CRDB-30858

@michae2 michae2 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Aug 23, 2023
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Aug 23, 2023
@mgartner mgartner moved this from Triage to New Backlog in SQL Queries Aug 31, 2023
@michae2
Copy link
Collaborator Author

michae2 commented Sep 5, 2023

Might be a dupe of #91460

@michae2
Copy link
Collaborator Author

michae2 commented Sep 5, 2023

This specific case (searching text fields of JSONB) might be addressed by #109955

@michae2
Copy link
Collaborator Author

michae2 commented Oct 6, 2023

Closing as a dupe of #91460.

@michae2 michae2 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2023
@github-project-automation github-project-automation bot moved this from New Backlog to Done in SQL Queries Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

1 participant