Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support expression-based index columns #24455

Closed
jordanlewis opened this issue Apr 3, 2018 · 12 comments
Closed

sql: support expression-based index columns #24455

jordanlewis opened this issue Apr 3, 2018 · 12 comments
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-encoding Relating to the SQL/KV encoding. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@jordanlewis
Copy link
Member

The computed column RFC included a section on virtual computed columns: columns that aren't stored on disk, but are computed on-demand from the other columns in their table.

These aren't yet supported, though. We should support them.

I wrote a half-complete prototype that materialized virtual computed columns eagerly in RowFetcher that was foiled by the lack of access to sql-homed name resolution code in RowFetcher, but based on a conversation with @knz, @petermattis and @justinj, it seems like we might be able to avoid materializing the columns until they're actually used in a computation. One way to do this would be to create a LazyDatum kind of thing that doesn't get populated until it's evaluated, but I'm sure there are complications with that as well.

@jordanlewis jordanlewis added this to the Later milestone Apr 3, 2018
@jordanlewis jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 3, 2018
@petermattis
Copy link
Collaborator

petermattis commented Apr 3, 2018

Can virtual computed columns be handled purely during planning? Everywhere you see a virtual computed column C, you replace it with the desired expression. That wouldn't require any changes to execution and leave it to the planner to decided where to best materialize the expression.

@knz
Copy link
Contributor

knz commented Apr 3, 2018

Yes what Peter just wrote is what I had in mind during the meeting earlier today.

@jordanlewis
Copy link
Member Author

Oh, I see. Good point! That does seem to simplify things a bit.

@petermattis
Copy link
Collaborator

Can you use a virtual computed column in an index? That might add some complexity back.

@jordanlewis
Copy link
Member Author

The backfiller would have to learn to run the compute expressions. The insert, upsert, update and delete paths must already know how to run compute expressions for the normal (stored) computed columns.

@petermattis
Copy link
Collaborator

There is also a question of how index selection would utilize an index on a virtual column. For example, if there is a virtual column lowerName STRING VIRTUAL lower(name) and an index on lowerName, we'd want to be able to use that index in a query such as SELECT * FROM users WHERE lower(name) = 'john'. Perhaps we require that the user write that query as SELECT * FROM users WHERE lowerName = 'john'. Or perhaps the planner can recognize that lower(name) == lowerName.

@knz knz added the A-sql-encoding Relating to the SQL/KV encoding. label Apr 28, 2018
@knz knz added the A-schema-descriptors Relating to SQL table/db descriptor handling. label Apr 28, 2018
@knz
Copy link
Contributor

knz commented Jun 4, 2018 via email

@awoods187
Copy link
Contributor

my mistake--thanks

@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@awoods187
Copy link
Contributor

@justinj thinks this is now much easier with the improvements to the CBO

@jordanlewis
Copy link
Member Author

Moving this to optimizer backlog, cc @RaduBerinde / @mgartner

@RaduBerinde RaduBerinde changed the title sql: support virtual computed columns sql: support expression-based index columns Jul 13, 2020
@RaduBerinde
Copy link
Member

RaduBerinde commented Jul 13, 2020

I did a bit of research of what work would be necessary to support expression-based indexes:

  • opt/cat interface and sql descriptor changes - need a way to refer to expressions as index "columns"
  • represent scans of expression-based indexes in opt (need "fake" columnids for constraints)
  • opt operator statistics
  • exploration rules for scans - need to recognize instances of the index expressions in filters
  • mutation paths - need to plumb values for the expressions when indexes are updated
  • schema change / backfiller support

Some aspects would be simplified if we treated the expressions as virtual columns (computed but not stored) throughout the stack, but other aspects would become more complicated (it would be the first non-scannable table column - and the code around different kinds of table columns is already very complex).

Overall the amount of work is on the same order of magnitude as partial indexes.

@RaduBerinde
Copy link
Member

Tracking this under #9682 and closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-encoding Relating to the SQL/KV encoding. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

5 participants