-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a number of encoding issues when evaluating expressions with the evalengine #13509
Fix a number of encoding issues when evaluating expressions with the evalengine #13509
Commits on Jul 17, 2023
-
collations: Fix appending to initial slice
The logic here was wrong and we were always starting from the start of slice if an existing buffer was passed in to append to. Instead we need to actually append from the end of the slice. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e40a738 - Browse repository at this point
Copy the full SHA e40a738View commit details -
Fix encoding issues for evaluated expressions
As described in vitessio#13493 we have various bugs where we transcode inputs incorrectly and where we lack collation information in the results returned. The changes here fix a number of things: - Ensure that we always return collation information for querypb.Field instances. This also updates a lot of the static mocked inputs to ensure that they match what real queries would return. Many of the changes associated with this part of the fixes is ensuring the mocks are expectations are updated properly. - Introduce an IntroduceExpr in the evalengine. One of the issues identified is that we didn't correctly handle hex escaped values and we would never see them being unescaped if a charset introducer was used. The cause of this is that we directly would change the collation information on the expression. This is wrong in the case of a HexNum or HexVal, since we loose information we need to transcode those. We could work around this for the AST evaluator, but the compiler couldn't be fixed as the real underlying value was not available with the actual hex encoded value. By introducing a new expression, we can also now properly compile this. - No flags information was sent along with evaluated results. This is similar to the collation information, although usually not as important for clients. We opted here to fix this nonetheless and match MySQL as closely as possible. - Introduce a `collations.SystemCollation` to use consistently for all internal table / column names information. It was using mostly the constant for utf8mb3 or no information at all which would be problematic as it means we woulnd't know what encoding it was. - Convert outputs to the connection collation / charset. This was entirely missing from Vitess but is behavior that MySQL has. If you read from say a `latin1` encoded column with a connection charset of `utf8mb4`, the value is converted before it's sent over the wire to `utf8mb4`. This fix also applies for the `evalengine` tests where we can now test also non UTF-8 compatible encodings, fixing a bunch of todos. This uncovered some other small bugs in the `evalengine` where we were converting things wrong. - The rowstreamer would return collation information from the result set, but that is now wrong because of the above conversion fixes. This means we need to use the actual fields information which we already have available to see the proper collation. We can copy all the attributes for the PK column. - To be explicit, it includes a rename from `collations.CollationUtf8ID` to `collations.CollationUtf8mb3ID` to be explicit about this collation name and match how MySQL is also changing this to be explicit. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 66fac65 - Browse repository at this point
Copy the full SHA 66fac65View commit details -
normalizer: Simplify normalization
This removes the runtime regexp compilation as that's not necessary. It also normalizes the casing to match MySQL to upcase things and adds a leading `0` if needed. We move the helpers for this to a new utility package. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4666428 - Browse repository at this point
Copy the full SHA 4666428View commit details -
Configuration menu - View commit details
-
Copy full SHA for d2c421f - Browse repository at this point
Copy the full SHA d2c421fView commit details -
Use explicit types and collations for columns and bindvars
This way we can ensure we always initialize these with the proper explicit typing information. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 41ab596 - Browse repository at this point
Copy the full SHA 41ab596View commit details -
evalengine: Cleanup unneeded flag
We know this value always based on whether the type is -1 or a valid value, so we don't need the additional boolean value here. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8da8cdf - Browse repository at this point
Copy the full SHA 8da8cdfView commit details -
planbuilder: Pass through type information as well
With the type information we can better decide if we need to print the collation change or not, since it's only needed for textual columns. This also fixes a number of cases where we used to use the weight_string but we no longer need to since we now pass through the correct type information in more cases. Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd02256 - Browse repository at this point
Copy the full SHA cd02256View commit details -
vtexplain: Update test output for improved query support
Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2aee93a - Browse repository at this point
Copy the full SHA 2aee93aView commit details -
Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 12d3cc9 - Browse repository at this point
Copy the full SHA 12d3cc9View commit details -
Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e382a3f - Browse repository at this point
Copy the full SHA e382a3fView commit details
Commits on Jul 18, 2023
-
Add constant for sqltypes.Unknown
Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f780e60 - Browse repository at this point
Copy the full SHA f780e60View commit details -
Signed-off-by: Dirkjan Bussink <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7afedbf - Browse repository at this point
Copy the full SHA 7afedbfView commit details -
Merge remote-tracking branch 'upstream/main' into dbussink/fix-many-e…
…ncoding-issues Signed-off-by: Andres Taylor <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 67b164f - Browse repository at this point
Copy the full SHA 67b164fView commit details
Commits on Jul 19, 2023
-
fix fake test result to not have weight_string as it is not present i…
…n the query send down Signed-off-by: Harshit Gangal <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for afcebb2 - Browse repository at this point
Copy the full SHA afcebb2View commit details