Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalengine: Support built-in MySQL function for CONV function #11566

Closed
wants to merge 5 commits into from

Conversation

Weijun-H
Copy link
Contributor

Signed-off-by: Weijun-H [email protected]

Description

The following functions will be supported

  • CONV()

Related Issue(s)

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Weijun-H <[email protected]>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 24, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a great implementation @Weijun-H. You need to put more thought into this. If you widen your set of integration test cases you'll see that it breaks at the seams very easily.

Comment on lines 277 to 278
fromBase := args[1]
toBase := args[2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture the args by reference

fromBase := args[1]
toBase := args[2]

if inarg.isNull() || fromBase.isNull() || toBase.isNull() || fromBase.int64() < 2 || fromBase.int64() > 36 || toBase.int64() < 2 || toBase.int64() > 36 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot call int64 over and over; toBase and fromBase need to be converted to their integer forms before they can be checked.

return
}

inarg.makeSignedIntegral()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want a signed integral for inarg. You want the raw textual bytes so you can process them.


inarg.makeSignedIntegral()

num, _ := strconv.ParseInt(fmt.Sprint(inarg.int64()), int(fromBase.int64()), 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL handles overflow with saturation, not 0-truncation.

Try:

+-----------------------------------------------------------------------------------------------------------------------+
| conv(99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 10, 2) |
+-----------------------------------------------------------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111                                                      |
+-----------------------------------------------------------------------------------------------------------------------+

convNum := strconv.FormatUint(uint64(num), int(toBase.int64()))
convNum = strings.ToUpper(convNum)
result.setRaw(sqltypes.VarChar, []byte(convNum), inarg.collation())
result.makeTextual(env.DefaultCollation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely no point on calling makeTextual after setRaw. Particularly if you're going to change the collation. You can set the desired collation directly on the previous call.

_, f1 := args[0].typeof(env)
// typecheck the right-hand argument but ignore its flags
args[1].typeof(env)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing typecheck for args[2].

Comment on lines 211 to 226
cases := []string{
"10",
"10 + '10' + 10",
"-10",
"'10'",
}
bases := []string{
"-1",
"1",
"2",
"4",
"8",
"10",
"16",
"32",
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very limited set of test cases. You need to widen this. Particularly important: hex literals.

Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually checked '+10' and it fails.
I also checked and '10-9+10' also fails. You aren't supposed to evaluate the string in the conv function. Just use the raw bytes and interpret them in the given base. Your output for this string is 9 (which I don't understand how you got to anyways, 11 I could understand) but MySQL just says 10, because it ignores everything after the - sign.

Comment on lines 211 to 226
cases := []string{
"-5.1",
"-5.9",
"0xa21 + '1'",
"-0xa21 + '1'",
"10",
"10 + '10' + 10",
"10 + '10' - 10",
"-10",
"'10'",
"10+'10'+'10a'+X'0a'",
"10 / 10",
"X'0FFFFFFFFFFFFFF'",
"99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999",
"-99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These still aren't strong enough cases. What happens if we have a + in the beginning? What if we have 2 of those in the beginning? What if it is a string with a + and not a number?

Comment on lines 291 to 294
if inarg.isNull() || inarg2.isNull() || inarg3.isNull() || fromBase < 2 || fromBase > 36 || toBase < 2 || toBase > 36 {
result.setNull()
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the checks inarg2.isNull() or inarge3.isNull() is required. The values of fromBase and toBase would be 0 if they were null respectively, so the condition would still be true.

Comment on lines 299 to 311
if t == sqltypes.Float64 {
for i, c := range rawString {
if c == '-' {
continue
}
if (fromBase <= 9 && c >= '0' && c <= rune('0'+fromBase)) || (fromBase > 9 && ((c >= '0' && c <= '9') || (c >= 'a' && c <= rune('a'+fromBase-9)))) {
continue
} else {
rawString = rawString[:i]
break
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason why you need the float type as a separate case?

}

re, _ := regexp.Compile(`[+-]?[0-9.x]+[a-vA-Vx]*`)
for _, num := range re.FindAllString(rawString, -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need all substring matches? Don't you only need the first one? In what case would have more than 1 matches?

Signed-off-by: Weijun-H <[email protected]>
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this, there are still cases that aren't checked and spoiler, some are still failing.
Specifically the case

 string_fun_test.go:256: different results: NULL; mysql response: VARCHAR("-H") (local collation: binary; mysql collation: utf8mb4_0900_ai_ci)
        query: SELECT CONV(-17, 10, -18) (SIMPLIFY=false)
    string_fun_test.go:252: different results: NULL; mysql response: VARCHAR("-15") (local collation: binary; mysql collation: utf8mb4_0900_ai_ci)
        query: SELECT CONV(-17, 16, -18) (SIMPLIFY=false)
    string_fun_test.go:256: different results: NULL; mysql response: VARCHAR("-15") (local collation: binary; mysql collation: utf8mb4_0900_ai_ci)
        query: SELECT CONV(-17, 16, -18) (SIMPLIFY=false)
    string_fun_test.go:252: different results: NULL; mysql response: VARCHAR("-23") (local collation: binary; mysql collation: utf8mb4_0900_ai_ci)
        query: SELECT CONV(-17, 32, -18) (SIMPLIFY=false)

MySQL allows to have negative from_base and to_base. If from_base is negative then the number is treated as a signed integer

toNum = strings.ToUpper(temp)
}

inarg.makeTextualAndConvert(env.DefaultCollation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, there really is no need to convert inarg to the environment's collation. You can just extract the collation from inarg and change the collation and then type cast the result directly using that.
Assuming that the goal is to have the collation finally be the default collation with the coercability and repertoire being inherited from the first argument.
In this case, you can even remove the inarg.makeUnsignedIntegral too, because you only need the raw bytes, so that is also only being done for the collation changes.

However, I don't know what coercability or repotoire we want. Is there some way to verify from MySQL what it does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just extract the collation from inarg and change the collation and then type cast the result directly using that.

I have no idea how to do in this way.

@GuptaManan100
Copy link
Member

In my opinion, the best way to do the conversion is to do it in two parts. You should convert using from_base and handle the behaviour of a negative from_base value. Once that conversion is complete, then do a second conversion into the to_base value and also handle the negative values there.
From what I have observed from MySQL behaviour is that, if the from_base is negative, then your input is treated as a signed integer, so the limits of int64 apply. Otherwise, the limits of uint64 apply!

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(1000000000000000000000000000000,10,10);
+---------------------------------------------+
| CONV(1000000000000000000000000000000,10,10) |
+---------------------------------------------+
| 18446744073709551615                        |
+---------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(1000000000000000000000000000000,-10,10);
+----------------------------------------------+
| CONV(1000000000000000000000000000000,-10,10) |
+----------------------------------------------+
| 9223372036854775807                          |
+----------------------------------------------+
1 row in set, 1 warning (0.00 sec)

and then if your to_base is negative, then your output can be signed -


mysql [localhost:8026] {msandbox} (test) > SELECT CONV(-1,-10,10);
+----------------------+
| CONV(-1,-10,10)      |
+----------------------+
| 18446744073709551615 |
+----------------------+
1 row in set (0.00 sec)

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(-1,-10,-10);
+------------------+
| CONV(-1,-10,-10) |
+------------------+
| -1               |
+------------------+
1 row in set (0.01 sec)

@GuptaManan100
Copy link
Member

These are all worthwhile test cases to have -

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(10000000000000000000000000,-10,-10);
+------------------------------------------+
| CONV(10000000000000000000000000,-10,-10) |
+------------------------------------------+
| 9223372036854775807                      |
+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(10000000000000000000000000,-10,10);
+-----------------------------------------+
| CONV(10000000000000000000000000,-10,10) |
+-----------------------------------------+
| 9223372036854775807                     |
+-----------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(10000000000000000000000000,10,10);
+----------------------------------------+
| CONV(10000000000000000000000000,10,10) |
+----------------------------------------+
| 18446744073709551615                   |
+----------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8026] {msandbox} (test) > SELECT CONV(10000000000000000000000000,10,-10);
+-----------------------------------------+
| CONV(10000000000000000000000000,10,-10) |
+-----------------------------------------+
| -1                                      |
+-----------------------------------------+
1 row in set, 1 warning (0.00 sec)

Signed-off-by: Weijun-H <[email protected]>
@github-actions
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Dec 11, 2022
@github-actions
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants