-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support VIRTUAL computed columns #57608
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
RaduBerinde
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Dec 5, 2020
craig bot
pushed a commit
that referenced
this issue
Dec 7, 2020
57584: sql: lift multi-region functions on to immutable descriptors r=ajwerner a=arulajmani Previously, we added some accessors on type and database descriptors that operated on the raw proto. Since, I've learned that this stuff should really be on immutable types instead, as we're moving to a new world where we move away from raw proto accesses. This patch lifts accessors from the raw proto and on to the immutable types. Release note: None 57606: sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG) r=RaduBerinde a=RaduBerinde #### sql: migrate other EXPLAIN variants to new exec factory interface This change migrates the other EXPLAIN variants to the new interface, which uses an explain factory to build the explained plan. This fixes a TODO in the experimental factory. Release note: None #### opt: add operator for CREATE STATISTICS This change adds an optimizer operator for CREATE STATISTICS (instead of using the opaque operator). Release note: None #### sql: reimplement EXPLAIN handling of CREATE STATISTICS CREATE STATISTICS runs the statistics plan inside a job. The plan node tree ends in a dead end at the createStatsNode. For explain purposes, we want to see information about the actual plan that runs inside the job. The current solution is to pretend that we build the plan directly in certain contexts. This solution is hard to extend for the new top-level instrumentation for EXPLAIN ANALYZE. This commit changes this solution to add a `runAsJob` flag and actually plan the distributed plan directly when we don't run as a job. We automatically set `runAsJob=false` whenever we are building inside Explain. Release note: None #### sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG) With this change, the bundle now contains the diagram for the underlying plan. Release note: None 57622: opt: parser, test catalog, query support for virtual columns r=RaduBerinde a=RaduBerinde This set of changes add parser, testcat, and query support for virtual columns. I have come full circle and realized that even though user-defined virtual columns don't help with the optimizer-side work for expression-based indexes, they help with integrating with the existing sql code. In particular, we can carve out pieces of work and write more targeted tests related to virtual columns, without having to define them through an expression-based index. Informs #57608. #### sql: support virtual columns in parser Support VIRTUAL keyword in the parser. We still error out if it is ever used. Release note: None #### opt: clean up hack in test catalog To ensure that PK columns are non-nullable, the test catalog reinitializes them to override the nullable flag. We replace this hacky mechanism with figuring out the PK columns beforehand and marking them as non-nullable in the definition before creating the table columns. Release note: None #### opt: test catalog support for virtual columns Release note: None #### opt: optbuilder support for scanning virtual columns This change adds support for scanning tables with virtual columns. The virtual columns are projected on top of the Scan. Release note: None Co-authored-by: arulajmani <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Dec 12, 2020
57803: sql: initial support for virtual columns r=RaduBerinde a=RaduBerinde This change adds very basic support for virtual columns to the non-test descriptors and catalog. This is gated behind an experimental session setting. Informs #57608. Release note: None Co-authored-by: Radu Berinde <[email protected]>
This was referenced Dec 16, 2020
craig bot
pushed a commit
that referenced
this issue
Dec 17, 2020
57998: sql: check use of virtual columns in indexes r=RaduBerinde a=RaduBerinde This commit adds checks that disallow virtual columns being used as primary key columns or stored columns in indexes. Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Dec 17, 2020
57931: opt: support DELETE with virtual columns r=RaduBerinde a=RaduBerinde Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>
Why this restriction? The virtual columns are going to end up as index keys, why not let them end up as index values? |
No strong reason - just trying to limit the initial work. Something we can allow later for sure. |
RaduBerinde
added a commit
to RaduBerinde/cockroach
that referenced
this issue
Feb 1, 2021
Disallow FK references to or from virtual columns. Note that there is no big reason why we couldn't support them other than some extra work and testing surface; this is tracked by cockroachdb#59671. Informs cockroachdb#57608. Release note: None
craig bot
pushed a commit
that referenced
this issue
Feb 4, 2021
59675: sql: disallow virtual columns in FK references r=RaduBerinde a=RaduBerinde Disallow FK references to or from virtual columns. Note that there is no big reason why we couldn't support them other than some extra work and testing surface; this is tracked by #59671. Informs #57608. Release note: None 59800: roachprod: extend error reporting for DNS sync r=rail a=rail Related to #58905 In some cases `roachprod sync` fails updating the crdb.io DNS entries without giving enough information about the failure. It looks like the root cause may be the file format passed to the `gcloud dns` command. Added extra information in the error message in order to get a better picture when it happens again. Release note: None Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Rail Aliiev <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Feb 19, 2021
60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde Informs #57608. Release note (sql change): CockroachDB now supports VIRTUAL computed columns (as opposed to STORED). These are computed columns that are not stored in the primary index and are recomputed as necessary. Co-authored-by: Radu Berinde <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Feb 19, 2021
60550: sql: add nodes for each EXPLAIN ANALYZE operator r=RaduBerinde a=RaduBerinde Show the cluster nodes involved in the execution of each operator. Note that this info does not show up in the non-analyze EXPLAIN. It is technically much more challenging to do that because of the indirect way we do distsql planning. Once we have the new DistSQL exec factory, we will be able to add it. Note on the implementation: I started by trying to make execution set `ComponentID.NodeID` in all cases, but I got stuck in `ProcessorBase` where we only have a `SQLIDContainer` available. I don't fully understand the new abstraction and whether the distsql components and flows should really use SQLIDs instead of NodeIDs. Unfortunately, there is not much we can do to test this currently (other than manual testing). I will investigate making the "deterministic" flag more fine-grained, so that we can hide truly non-deterministic values (like timings) separately from those that just vary with the configuration (query distribution). Example: ``` movr> EXPLAIN ANALYZE SELECT * FROM rides JOIN vehicles ON rides.vehicle_id = vehicles.id; info -------------------------------------------- planning time: 158µs execution time: 7ms distribution: full vectorized: true hash join │ cluster nodes: n1, n2, n3 │ actual row count: 500 │ equality: (vehicle_id) = (id) │ ├── scan │ cluster nodes: n1, n2, n3 │ actual row count: 500 │ KV rows read: 500 │ KV bytes read: 86 KiB │ missing stats │ table: rides@primary │ spans: FULL SCAN │ └── scan cluster nodes: n1, n2, n3 actual row count: 15 KV rows read: 15 KV bytes read: 2.3 KiB missing stats table: vehicles@primary spans: FULL SCAN ``` Release note (sql change): EXPLAIN ANALYZE now includes the nodes which were involved in the execution of each operator in the tree. Informs #59860. 60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde Informs #57608. Release note (sql change): CockroachDB now supports VIRTUAL computed columns (as opposed to STORED). These are computed columns that are not stored in the primary index and are recomputed as necessary. Co-authored-by: Radu Berinde <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Feb 23, 2021
60827: sql: propagate result types to inputs during set ops with nulls and tuples r=yuzefovich a=yuzefovich **sql: move MergeResultTypes from physicalplan package and unexport** Release note: None **sql: propagate result types to inputs in set ops with nulls and tuples** The physical planning for set operations is special because it allows for the result types from its inputs to be of different type family in some cases. Namely, it is ok if one input has `types.Unknown` whereas the other input is of some "known" type. In order to handle such case the execbuilder plans casts; however, currently it is not possible to plan casts to `Tuple` type. As a result, the vectorized engine is not able to execute the query because it expects the data coming from both inputs to be of the same type. This commit modifies the logic of reconciling the result types from both inputs to additionally update unknown types on the inputs if the other input is of a Tuple type. Such behavior is acceptable because in the original plan we only expected to have NULL values, and the Tuple type is able to handle such case too. The problem can also be reproduced in a logic test with the row-by-row engine in the `fakedist` setting. This bug was introduced in a76ee31, and before that change we had the following behavior: overwrite `plan.ResultTypes` to the merged types and then plan a stage of distinct processors if we have UNION. This resulted in the input spec for the input stream to distinct having correctly set types. After the change, that was no longer the case. Before that change the vectorized engine couldn't handle such queries due to type mismatch during `SupportsVectorized` check, so we fell back to the row-by-row engine. However, this commit makes it so that the vectorized engine is able to execute such queries. Fixes: #59611. Release note (bug fix): CockroachDB previously could encounter an internal error when performing UNION operation when the first input resulted only in NULL values and consequent inputs produce tuples, and this is now fixed. Only 21.1 alpha versions are affected. 60853: sql: do not allow reference FK partial unique constraints r=mgartner a=mgartner This commit prevents users from creating a FK with a reference column that has a partial unique constraint. These constraints do not guarantee uniqueness in the entire table so they cannot be used. There is no release note because these constraints are gated behind the experimental_enable_unique_without_index_constraints session variable. Release note: None 60944: bulkio: Avoid using AssertionFailed error when planning replication. r=miretskiy a=miretskiy AssertionFailed error message is not appropriate. Return "unsupported error" instead. Release Notes: None 60945: sql: add tests with unique constraints and virtual columns r=RaduBerinde a=RaduBerinde Logic tests for unique indexes and constraints involving virtual columns. Informs #57608. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Mar 5, 2021
61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif Touches #59188. Follow-on work from #60678. We can introduce byte-limits for verbose logging and structured events, instead of limiting things based on count. This PR also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None --- +cc @knz / @erikgrinaker / @angelapwen for pod-visibility. 61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi Fixes: #59711 Previously, there were only over all counters tracking how many jobs were completed, cancelled, or failed. This was inadequate because it didn't make it easy to tell in aggregate what job types they were. To address this, this patch will add counters for different job types for tracking success, failure, and cancellation. Release justification: Low risk change only adding a metric inside the crdb_internal.feature_usage table Release note: None 61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde This changes the random table generator to also create computed columns (either STORED or VIRTUAL). Some example of definitions: - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL` - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED` - `col1_13 INT4 AS (col1_0 + col1_10) STORED` Release justification: non-production code change. Release note: None Informs #57608. 61509: sql: add a regression test r=RaduBerinde a=RaduBerinde This commit adds a regression test for #58104 (the problem was already fixed). Resolves #58104. Release justification: non-production code change. Release note: None 61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner Previously, the fetch scope incorrectly included columns in the FROM clause of an UPDATE..FROM statement. As a result, column names shared by the FROM clause and the mutating table lead to ambiguity when resolving partial index DEL column expressions. This commit ensures that the fetch scope does not include columns in the FROM clause. Fixes #61284 Release justification: This is a low-risk bug fix to existing functionality. Release note (bug fix): An UPDATE..FROM statement where the FROM clause contained column names that match table column names erred if the table had a partial index predicate referencing those columns. This bug, present since partial indexes were released in version 20.2.0, has been fixed. 61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna These errors.Wrap calls are wrapping errors that are nil and thus will always return a nil error. Release justification: Minor error handling fixes Release note: None Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Steven Danna <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Mar 6, 2021
61567: backupccl: add a BACKUP/RESTORE test with a virtual column r=RaduBerinde a=RaduBerinde Release justification: non-production code change. Release note: None Informs #57608. Co-authored-by: Radu Berinde <[email protected]>
Outstanding items (none of which are critical) have been filed as separate issues; closing this. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
This issue tracks implementing virtual computed columns (using
VIRTUAL
instead ofSTORED
). This is closely related to #9682 - the expression-based index columns will be implemented as virtual columns internally. It is also related to hash shared indexes where we would like the bucket column to be virtual.Running checklist:
Non-critical items:
(filed separate issue #61565).
indexConstraintCtx.simplifyFilter
to remove the redundant filters like theb:2 = 3
below:(filed separate issue #61566).
21.2 items:
The text was updated successfully, but these errors were encountered: