-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Unify the SQL type system #36598
Conversation
Note to reviewers: It might be easiest to review the final |
This is fantastic. Thank you for doing this! |
Anyone have suggestions about resolving a
I'm already using UUID to signify the name of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)
pkg/sql/pgwire/types.go, line 73 at r15 (raw file):
// mapResultOid performs any necessary mapping of Oid values to be returned to // the client. For now, this only includes mapping T_float4 => T_float8. func mapResultOid(o oid.Oid) oid.Oid {
@knz, I wasn't sure what to do about this issue (I believe it's a bug in libpq). I ended up mapping from float4 => float8 to work around it. But it raises larger issues about whether it's a good idea to begin returning the true OID values for selected columns. Should we map all OIDs to "canonical" OIDs (e.g. int2 => int8, varchar => string, etc) in order to match previous version behavior?
P.S. The PG client does not exhibit this bad float4 behavior when pointed at CRDB. It's a libpq thing.
Another suggestion for reviewers: compare all changes made to test files (e.g. |
Just add an exception here: https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/lint_test.go#L1451 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)
pkg/sql/logictest/testdata/logic_test/array, line 212 at r15 (raw file):
{1,2,3} query T
This new behavior matches PG.
c97cd19
to
cc215c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)
pkg/sql/parser/parse.go, line 374 at r17 (raw file):
} // The SERIAL types are pseudo-types that are only used during parsing. After
@knz, I wanted to call this out. I'm treating SERIAL differently now by not creating a "real" type for it. Instead, it's just different instances of the INT types that I then use == reference comparison to recognize. All "memory" of these types is gone after the analysis phase is complete; they exist only for pretty-printing the parse tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bobvawter, @BramGruneir, @jordanlewis, and @knz)
pkg/sql/pgwire/types.go, line 73 at r15 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
@knz, I wasn't sure what to do about this issue (I believe it's a bug in libpq). I ended up mapping from float4 => float8 to work around it. But it raises larger issues about whether it's a good idea to begin returning the true OID values for selected columns. Should we map all OIDs to "canonical" OIDs (e.g. int2 => int8, varchar => string, etc) in order to match previous version behavior?
P.S. The PG client does not exhibit this bad float4 behavior when pointed at CRDB. It's a libpq thing.
As another example, not mapping to previous OIDs results in this Java acceptance test failing:
public void testArrayWithProps() throws Exception {
PreparedStatement stmt = conn.prepareStatement("CREATE TABLE x (a SMALLINT[])");
stmt.execute();
stmt = conn.prepareStatement("INSERT INTO x VALUES (ARRAY[123])");
stmt.execute();
stmt = conn.prepareStatement("SELECT a FROM x");
ResultSet rs = stmt.executeQuery();
rs.next();
Array ar = rs.getArray(1);
Long[] fs = (Long[]) ar.getArray();
Assert.assertArrayEquals(new Long[]{123L}, fs);
}
with the error: incompatible types: long cannot be converted to java.lang.Integer
since oid.T__int2 is returned rather than oid.T__int8, as before.
ColumnType is a rather big struct, and it should almost never be passed by value. Update most usages to pass by ref rather than by value. Release note: None
Convert all uses of the existing types.T interface and structs to instead use an alias type: type T = ColumnType In a future commit, ColumnType will be converted to T so that the alias is no longer necessary. This change requires all current usages of types.T to be converted to *types.T, since types.T is now a struct instead of an interface. Release note: None
Update the String() method on types.T to show a type name that is closer to what coltypes.T shows. For example, varchar and bit are currently never returned by types.T, which instead maps them to string and varbit, respectively. This commit prepares for unification of types.T and coltypes.T. Release note: None
Directly create types.T instances in the SQL parser and other places where coltypes.T is in use (e.g. cast, annotate exprs). Doing this requires: 1. Support for SERIAL pseudo-types to types.T 2. Reconciliation of type name string algorithms This commit also removes the coltypes package, since it is no longer in use. In addition, this commit reworks some of the random type generation needed to get various tests fully working after these typing changes. Release note: None
It's confusing that the type is called NULL, even though PG calls it UNKNOWN and it prints as UNKNOWN. Change it to be consistent. Release note: None
Remove the "type T = ColumnType" alias by renaming ColumnType to T. Now there is just one way to refer to SQL types in the system: types.T, which should reduce confusion. Also delete VisibleType, since it's no longer used, except for mapping to/from the legacy ColumnType representation. Release note: None
Do not export fields from InternalType as fields on types.T. Instead, corresponding methods. This enables tighter control over how the fields are set. In particular, the Unmarshal method and the pre-constructed types can now guarantee that the Oid, Width, and Locale fields are properly set. This better encapsulation makes it more difficult for callers to accidentally construct invalid types, which is a real concern, given all the subtle rules and usages of various Type objects across multiple releases. This change means that every types.T constructor now needs to call a constructor method or use a pre-constructed type. Release note: None
Put constants and methods in a more intuitive order, for easier comprehension. Only code is moved; this commit makes no functional changes, and so does not need to be carefully reviewed. Release note: None
Rearrange and clean up methods in the package, including moving some functionality from the package to other locations (e.g. information schema related methods). Add comments to all exported structs and methods in the types package. Add tests for all functionality in the package. Fix compatibility issue in pg_type table, where oidvector and int2vector had array oid types of 0. Release note (sql change): Display correct array type oids for oidvector and int2vector types in pg_type table, for better PG compatibility.
Now that the type system is unified, put the new types package in a more discoverable, intuitive location. Release note: None
Update terminology from SemanticType to Family, per conversation on PR cockroachdb#36699. The rename follows this pattern: INT -> IntFamily STRING -> StringFamily ... In addition, update all the places that were using the name of the family as if it were the name of the type, and update them to use one of the other type name methods instead. Add a PGName() method to round out the set of type name methods that we use in various places in the code. Release note: None
When a SQL projection column has no alias, a name is generated for it. If the expression is a CAST expression, the name is derived from the data type of the cast. This commit uses the PG-compatible name rather than the CRDB name for the data type, which slightly increases PG-compat. Release note (sql change): Use the PG-compatible name when auto-generating a column alias for a CAST expression.
Currently, locale strings are quoted when they contain underscore or dash characters. The Unicode CLDR standard treats those characters as equivalent. Furthermore, SQL identifiers don't need to be quoted. So this commit introduces a new EncodeLocaleName method that maps dash characters to underscore characters so that locale names do not need to be quoted when printed as part of SQL syntax. This change resolves a compatibility issue with the TypeORM driver, which does not handle quoted locale names. Release note (sql change): Locale names in COLLATE clauses have dash characters mapped to underscore when printed as part of SQL syntax.
bors r+ |
36598: sql: Unify the SQL type system r=andy-kimball a=andy-kimball This PR unifies the three different SQL type systems in CRDB: `sqlbase.ColumnType` - protobuf representation of table column types `coltypes.T` - in-memory representation of table column types `types.T` - computation types The type systems are quite similar, but also subtly different. This causes bugs on a regular basis throughout the codebase. Each type system supports a different subset of the PG type functionality. For example, `sqlbase.ColumnType` stores column widths. `types.T` does not store widths, but it can store a Postgres OID value, which `sqlbase.ColumnType` is unable to do. In addition, even for supported functionality, the API's are often tricky and error-prone. For example, `types.T` values can sometimes be compared using `==`, but other times `Equivalent` must be called, and it is up to the user to know when to use which. As another example, support for OID values require wrapping `types.T` instances in a special `TOIDWrapper` type. Our developers are expected to know when it needs to be done, but typically they either don't understand when it's needed, or else forget to do it. It also breaks `==` comparison semantics, leading to subtle bugs that only manifest in edge cases with unusual OID values. Another big problem that comes up repeatedly are lossy conversions. Throughout the codebase, there are frequent conversions to/from the various type systems. Many conversions discard important type information, which often leads to incompatibilities with Postgres, as well as providing a breeding ground for bugs. This PR unifies the three type systems into just one. The unified type is called `types.T` (in new `sql/types` package). It uses a protobuf-generated struct that is backwards-compatible with `sqlbase.ColumnType` so that it can be embedded in place of `ColumnType` in larger protobuf messages (like `sqlbase.ColumnDescriptor`). The API is locked down so that it's difficult to ignorantly or accidentally create invalid types. The unified type supports everything that the three previous type systems did, and is more compatible with PG as a result (with more work, even more type compatibility is possible). Resolves #32639 36681: distsqlplan: reset MergeOrdering if windower is planned r=yuzefovich a=yuzefovich Windower doesn't guarantee maintaining the order, so merge ordering of the plan should be set to empty. Fixes: #35179. Release note: None 36962: storage/engine: fix handling of 0-length varstrings in RocksDBBatchReader r=tbg a=petermattis Previously, 0-length varstrings inadvertently cuased the reader to truncate the batch repr which would usually result in a call to `Next` complaining about the batch containing an unexpected number of entries. So far, it looks like the only effect of this would be "invalid batch" errors while using `cockroach debug` commands. It is possible there is a more serious effect, though. See #36937 Release note (bug fix): Fixed a bug in write batch decoding that could cause "invalid batch" errors while using `cockroach debug` commands to analyze data. Co-authored-by: Andrew Kimball <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Peter Mattis <[email protected]>
Build succeeded |
With the updated unified types from cockroachdb#36598, a number of tests are now passing. Each test has already been vetted. Fixes cockroachdb#36971 Release note: None
This PR unifies the three different SQL type systems in CRDB:
sqlbase.ColumnType
- protobuf representation of table column typescoltypes.T
- in-memory representation of table column typestypes.T
- computation typesThe type systems are quite similar, but also subtly different. This causes bugs on a regular basis throughout the codebase. Each type system supports a different subset of the PG type functionality. For example,
sqlbase.ColumnType
stores column widths.types.T
does not store widths, but it can store a Postgres OID value, whichsqlbase.ColumnType
is unable to do.In addition, even for supported functionality, the API's are often tricky and error-prone. For example,
types.T
values can sometimes be compared using==
, but other timesEquivalent
must be called, and it is up to the user to know when to use which. As another example, support for OID values require wrappingtypes.T
instances in a specialTOIDWrapper
type. Our developers are expected to know when it needs to be done, but typically they either don't understand when it's needed, or else forget to do it. It also breaks==
comparison semantics, leading to subtle bugs that only manifest in edge cases with unusual OID values.Another big problem that comes up repeatedly are lossy conversions. Throughout the codebase, there are frequent conversions to/from the various type systems. Many conversions discard important type information, which often leads to incompatibilities with Postgres, as well as providing a breeding ground for bugs.
This PR unifies the three type systems into just one. The unified type is called
types.T
(in newsql/types
package). It uses a protobuf-generated struct that is backwards-compatible withsqlbase.ColumnType
so that it can be embedded in place ofColumnType
in larger protobuf messages (likesqlbase.ColumnDescriptor
). The API is locked down so that it's difficult to ignorantly or accidentally create invalid types. The unified type supports everything that the three previous type systems did, and is more compatible with PG as a result (with more work, even more type compatibility is possible).Resolves #32639