sql: support VIRTUAL computed columns #57608

RaduBerinde · 2020-12-05T18:40:31Z

This issue tracks implementing virtual computed columns (using VIRTUAL instead of STORED). This is closely related to #9682 - the expression-based index columns will be implemented as virtual columns internally. It is also related to hash shared indexes where we would like the bucket column to be virtual.

Running checklist:

Non-critical items:

figure out how to prove implication for cases like:

CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT,
  c INT AS (b + 10) VIRTUAL,
  INDEX (a) WHERE c IN (10, 20, 30)
)

SELECT k FROM t WHERE c = 20

(filed separate issue #61565).

handle numeric references case where we specify a virtual column but not a dependent one (applies to stored computed columns as well); filed separate issue opt: better validation of column numeric references #61563.
teach indexConstraintCtx.simplifyFilter to remove the redundant filters like the b:2 = 3 below:

exec-ddl
CREATE TABLE tmp (
  a INT,
  b INT,
  v INT AS (b + 10) VIRTUAL,
  INDEX idx (v)
)
----

opt
SELECT * FROM tmp WHERE v = 13
----
  project
   ├── columns: a:1 b:2!null v:3!null
   ├── immutable
   ├── fd: ()-->(2,3)
   ├── select
   │    ├── columns: a:1 b:2!null
   │    ├── fd: ()-->(2)
   │    ├── index-join tmp
   │    │    ├── columns: a:1 b:2
   │    │    └── scan tmp@idx
   │    │         ├── columns: rowid:4!null
   │    │         ├── constraint: /3/4: [/13 - /13]
   │    │         └── key: (4)
   │    └── filters
   │         └── b:2 = 3 [outer=(2), constraints=(/2: [/3 - /3]; tight), fd=()-->(2)]
   └── projections
        └── b:2 + 10 [as=v:3, outer=(2), immutable]

(filed separate issue #61566).

21.2 items:

use virtual column for hash shared indexes (filed separate issue sql: use virtual column for hash shared indexes #61564).

The text was updated successfully, but these errors were encountered:

57584: sql: lift multi-region functions on to immutable descriptors r=ajwerner a=arulajmani Previously, we added some accessors on type and database descriptors that operated on the raw proto. Since, I've learned that this stuff should really be on immutable types instead, as we're moving to a new world where we move away from raw proto accesses. This patch lifts accessors from the raw proto and on to the immutable types. Release note: None 57606: sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG) r=RaduBerinde a=RaduBerinde #### sql: migrate other EXPLAIN variants to new exec factory interface This change migrates the other EXPLAIN variants to the new interface, which uses an explain factory to build the explained plan. This fixes a TODO in the experimental factory. Release note: None #### opt: add operator for CREATE STATISTICS This change adds an optimizer operator for CREATE STATISTICS (instead of using the opaque operator). Release note: None #### sql: reimplement EXPLAIN handling of CREATE STATISTICS CREATE STATISTICS runs the statistics plan inside a job. The plan node tree ends in a dead end at the createStatsNode. For explain purposes, we want to see information about the actual plan that runs inside the job. The current solution is to pretend that we build the plan directly in certain contexts. This solution is hard to extend for the new top-level instrumentation for EXPLAIN ANALYZE. This commit changes this solution to add a `runAsJob` flag and actually plan the distributed plan directly when we don't run as a job. We automatically set `runAsJob=false` whenever we are building inside Explain. Release note: None #### sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG) With this change, the bundle now contains the diagram for the underlying plan. Release note: None 57622: opt: parser, test catalog, query support for virtual columns r=RaduBerinde a=RaduBerinde This set of changes add parser, testcat, and query support for virtual columns. I have come full circle and realized that even though user-defined virtual columns don't help with the optimizer-side work for expression-based indexes, they help with integrating with the existing sql code. In particular, we can carve out pieces of work and write more targeted tests related to virtual columns, without having to define them through an expression-based index. Informs #57608. #### sql: support virtual columns in parser Support VIRTUAL keyword in the parser. We still error out if it is ever used. Release note: None #### opt: clean up hack in test catalog To ensure that PK columns are non-nullable, the test catalog reinitializes them to override the nullable flag. We replace this hacky mechanism with figuring out the PK columns beforehand and marking them as non-nullable in the definition before creating the table columns. Release note: None #### opt: test catalog support for virtual columns Release note: None #### opt: optbuilder support for scanning virtual columns This change adds support for scanning tables with virtual columns. The virtual columns are projected on top of the Scan. Release note: None Co-authored-by: arulajmani <[email protected]> Co-authored-by: Radu Berinde <[email protected]>

57803: sql: initial support for virtual columns r=RaduBerinde a=RaduBerinde This change adds very basic support for virtual columns to the non-test descriptors and catalog. This is gated behind an experimental session setting. Informs #57608. Release note: None Co-authored-by: Radu Berinde <[email protected]>

57998: sql: check use of virtual columns in indexes r=RaduBerinde a=RaduBerinde This commit adds checks that disallow virtual columns being used as primary key columns or stored columns in indexes. Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>

57931: opt: support DELETE with virtual columns r=RaduBerinde a=RaduBerinde Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>

ajwerner · 2021-01-14T00:21:26Z

disallow indexes STORING virtual columns

Why this restriction? The virtual columns are going to end up as index keys, why not let them end up as index values?

RaduBerinde · 2021-01-14T00:25:58Z

No strong reason - just trying to limit the initial work. Something we can allow later for sure.

Disallow FK references to or from virtual columns. Note that there is no big reason why we couldn't support them other than some extra work and testing surface; this is tracked by cockroachdb#59671. Informs cockroachdb#57608. Release note: None

59675: sql: disallow virtual columns in FK references r=RaduBerinde a=RaduBerinde Disallow FK references to or from virtual columns. Note that there is no big reason why we couldn't support them other than some extra work and testing surface; this is tracked by #59671. Informs #57608. Release note: None 59800: roachprod: extend error reporting for DNS sync r=rail a=rail Related to #58905 In some cases `roachprod sync` fails updating the crdb.io DNS entries without giving enough information about the failure. It looks like the root cause may be the file format passed to the `gcloud dns` command. Added extra information in the error message in order to get a better picture when it happens again. Release note: None Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Rail Aliiev <[email protected]>

60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde Informs #57608. Release note (sql change): CockroachDB now supports VIRTUAL computed columns (as opposed to STORED). These are computed columns that are not stored in the primary index and are recomputed as necessary. Co-authored-by: Radu Berinde <[email protected]>

60550: sql: add nodes for each EXPLAIN ANALYZE operator r=RaduBerinde a=RaduBerinde Show the cluster nodes involved in the execution of each operator. Note that this info does not show up in the non-analyze EXPLAIN. It is technically much more challenging to do that because of the indirect way we do distsql planning. Once we have the new DistSQL exec factory, we will be able to add it. Note on the implementation: I started by trying to make execution set `ComponentID.NodeID` in all cases, but I got stuck in `ProcessorBase` where we only have a `SQLIDContainer` available. I don't fully understand the new abstraction and whether the distsql components and flows should really use SQLIDs instead of NodeIDs. Unfortunately, there is not much we can do to test this currently (other than manual testing). I will investigate making the "deterministic" flag more fine-grained, so that we can hide truly non-deterministic values (like timings) separately from those that just vary with the configuration (query distribution). Example: ``` movr> EXPLAIN ANALYZE SELECT * FROM rides JOIN vehicles ON rides.vehicle_id = vehicles.id; info -------------------------------------------- planning time: 158µs execution time: 7ms distribution: full vectorized: true hash join │ cluster nodes: n1, n2, n3 │ actual row count: 500 │ equality: (vehicle_id) = (id) │ ├── scan │ cluster nodes: n1, n2, n3 │ actual row count: 500 │ KV rows read: 500 │ KV bytes read: 86 KiB │ missing stats │ table: rides@primary │ spans: FULL SCAN │ └── scan cluster nodes: n1, n2, n3 actual row count: 15 KV rows read: 15 KV bytes read: 2.3 KiB missing stats table: vehicles@primary spans: FULL SCAN ``` Release note (sql change): EXPLAIN ANALYZE now includes the nodes which were involved in the execution of each operator in the tree. Informs #59860. 60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde Informs #57608. Release note (sql change): CockroachDB now supports VIRTUAL computed columns (as opposed to STORED). These are computed columns that are not stored in the primary index and are recomputed as necessary. Co-authored-by: Radu Berinde <[email protected]>

60827: sql: propagate result types to inputs during set ops with nulls and tuples r=yuzefovich a=yuzefovich **sql: move MergeResultTypes from physicalplan package and unexport** Release note: None **sql: propagate result types to inputs in set ops with nulls and tuples** The physical planning for set operations is special because it allows for the result types from its inputs to be of different type family in some cases. Namely, it is ok if one input has `types.Unknown` whereas the other input is of some "known" type. In order to handle such case the execbuilder plans casts; however, currently it is not possible to plan casts to `Tuple` type. As a result, the vectorized engine is not able to execute the query because it expects the data coming from both inputs to be of the same type. This commit modifies the logic of reconciling the result types from both inputs to additionally update unknown types on the inputs if the other input is of a Tuple type. Such behavior is acceptable because in the original plan we only expected to have NULL values, and the Tuple type is able to handle such case too. The problem can also be reproduced in a logic test with the row-by-row engine in the `fakedist` setting. This bug was introduced in a76ee31, and before that change we had the following behavior: overwrite `plan.ResultTypes` to the merged types and then plan a stage of distinct processors if we have UNION. This resulted in the input spec for the input stream to distinct having correctly set types. After the change, that was no longer the case. Before that change the vectorized engine couldn't handle such queries due to type mismatch during `SupportsVectorized` check, so we fell back to the row-by-row engine. However, this commit makes it so that the vectorized engine is able to execute such queries. Fixes: #59611. Release note (bug fix): CockroachDB previously could encounter an internal error when performing UNION operation when the first input resulted only in NULL values and consequent inputs produce tuples, and this is now fixed. Only 21.1 alpha versions are affected. 60853: sql: do not allow reference FK partial unique constraints r=mgartner a=mgartner This commit prevents users from creating a FK with a reference column that has a partial unique constraint. These constraints do not guarantee uniqueness in the entire table so they cannot be used. There is no release note because these constraints are gated behind the experimental_enable_unique_without_index_constraints session variable. Release note: None 60944: bulkio: Avoid using AssertionFailed error when planning replication. r=miretskiy a=miretskiy AssertionFailed error message is not appropriate. Return "unsupported error" instead. Release Notes: None 60945: sql: add tests with unique constraints and virtual columns r=RaduBerinde a=RaduBerinde Logic tests for unique indexes and constraints involving virtual columns. Informs #57608. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Radu Berinde <[email protected]>

@knz

61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif Touches #59188. Follow-on work from #60678. We can introduce byte-limits for verbose logging and structured events, instead of limiting things based on count. This PR also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None --- +cc @knz / @erikgrinaker / @angelapwen for pod-visibility. 61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi Fixes: #59711 Previously, there were only over all counters tracking how many jobs were completed, cancelled, or failed. This was inadequate because it didn't make it easy to tell in aggregate what job types they were. To address this, this patch will add counters for different job types for tracking success, failure, and cancellation. Release justification: Low risk change only adding a metric inside the crdb_internal.feature_usage table Release note: None 61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde This changes the random table generator to also create computed columns (either STORED or VIRTUAL). Some example of definitions: - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL` - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED` - `col1_13 INT4 AS (col1_0 + col1_10) STORED` Release justification: non-production code change. Release note: None Informs #57608. 61509: sql: add a regression test r=RaduBerinde a=RaduBerinde This commit adds a regression test for #58104 (the problem was already fixed). Resolves #58104. Release justification: non-production code change. Release note: None 61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner Previously, the fetch scope incorrectly included columns in the FROM clause of an UPDATE..FROM statement. As a result, column names shared by the FROM clause and the mutating table lead to ambiguity when resolving partial index DEL column expressions. This commit ensures that the fetch scope does not include columns in the FROM clause. Fixes #61284 Release justification: This is a low-risk bug fix to existing functionality. Release note (bug fix): An UPDATE..FROM statement where the FROM clause contained column names that match table column names erred if the table had a partial index predicate referencing those columns. This bug, present since partial indexes were released in version 20.2.0, has been fixed. 61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna These errors.Wrap calls are wrapping errors that are nil and thus will always return a nil error. Release justification: Minor error handling fixes Release note: None Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Steven Danna <[email protected]>

61567: backupccl: add a BACKUP/RESTORE test with a virtual column r=RaduBerinde a=RaduBerinde Release justification: non-production code change. Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>

RaduBerinde · 2021-03-11T20:30:26Z

Outstanding items (none of which are critical) have been filed as separate issues; closing this.

RaduBerinde added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 5, 2020

RaduBerinde self-assigned this Dec 5, 2020

RaduBerinde mentioned this issue Dec 7, 2020

opt: parser, test catalog, query support for virtual columns #57622

Merged

RaduBerinde mentioned this issue Dec 10, 2020

sql: initial support for virtual columns #57803

Merged

This was referenced Dec 16, 2020

sql: check use of virtual columns in indexes #57998

Merged

opt: support DELETE with virtual columns #57931

Merged

craig bot pushed a commit that referenced this issue Dec 17, 2020

Merge #57931

4691938

57931: opt: support DELETE with virtual columns r=RaduBerinde a=RaduBerinde Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>

RaduBerinde mentioned this issue Jan 26, 2021

sql: support adding virtual columns #59401

Merged

RaduBerinde mentioned this issue Feb 1, 2021

sql: disallow virtual columns in FK references #59675

Merged

mgartner mentioned this issue Feb 16, 2021

opt: add test constraining partial index on virtual column #60601

Merged

RaduBerinde mentioned this issue Feb 18, 2021

sql: remove experimental setting for virtual columns #60748

Merged

RaduBerinde mentioned this issue Feb 22, 2021

sql: add tests with unique constraints and virtual columns #60945

Merged

RaduBerinde mentioned this issue Mar 4, 2021

sqlsmith: add support for computed columns #61491

Merged

RaduBerinde mentioned this issue Mar 5, 2021

backupccl: add a BACKUP/RESTORE test with a virtual column #61567

Merged

RaduBerinde closed this as completed Mar 11, 2021

iameli mentioned this issue May 4, 2021

sql: support expression-based index columns #9682

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support VIRTUAL computed columns #57608

sql: support VIRTUAL computed columns #57608

RaduBerinde commented Dec 5, 2020 •

edited by kevin-v-ngo

Loading

ajwerner commented Jan 14, 2021

RaduBerinde commented Jan 14, 2021

RaduBerinde commented Mar 11, 2021

sql: support VIRTUAL computed columns #57608

sql: support VIRTUAL computed columns #57608

Comments

RaduBerinde commented Dec 5, 2020 • edited by kevin-v-ngo Loading

ajwerner commented Jan 14, 2021

RaduBerinde commented Jan 14, 2021

RaduBerinde commented Mar 11, 2021

RaduBerinde commented Dec 5, 2020 •

edited by kevin-v-ngo

Loading