Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codec, json: fix type json hash group key #15973

Closed
wants to merge 1 commit into from

Conversation

zxh326
Copy link

@zxh326 zxh326 commented Apr 1, 2020

What problem does this PR solve?

Issue Number: close #10467

What is changed and how it works?

What's Changed:
change hashgroupkey when col type is json

How it Works:
change int64 in json to float64

I write a samle code to test benchmark gist,will lose some performance in aggregate executor when col type is json

goos: darwin
goarch: amd64
pkg: leet-go/jsont
BenchmarkHashJsonValue-8        12260102                96.3 ns/op            80 B/op          2 allocs/op
BenchmarkBasic-8                32172847                36.5 ns/op            48 B/op          1 allocs/op
PASS
ok      leet-go/jsont   3.095s

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test

Side effects
N/A

Release note

@zxh326 zxh326 requested a review from a team as a code owner April 1, 2020 12:40
@sre-bot sre-bot added the contribution This PR is from a community contributor. label Apr 1, 2020
@ghost ghost requested review from wshwsh12 and removed request for a team April 1, 2020 12:40
@zz-jason zz-jason added component/util type/bugfix This PR fixes a bug. labels Apr 2, 2020
Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, and thanks for your contribution.
I think converting int64 to float64 is not a good idea. It will convert different int64 to the same float64. For example, (run in this pr.)

tidb> create table tx2 (col json);
Query OK, 0 rows affected (0.00 sec)

tidb> insert into tx2 values (json_array(922337203685477580));
Query OK, 1 row affected (0.01 sec)

tidb> insert into tx2 values (json_array(922337203685477581));
Query OK, 1 row affected (0.00 sec)

tidb> select col, count(1) from tx2 group by col;
+----------------------+----------+
| col                  | count(1) |
+----------------------+----------+
| [922337203685477580] |        2 |
+----------------------+----------+
1 row in set (0.00 sec)

@zxh326
Copy link
Author

zxh326 commented Apr 9, 2020

Sorry for the late reply, and thanks for your contribution.
I think converting int64 to float64 is not a good idea. It will convert different int64 to the same float64. For example, (run in this pr.)

tidb> create table tx2 (col json);
Query OK, 0 rows affected (0.00 sec)

tidb> insert into tx2 values (json_array(922337203685477580));
Query OK, 1 row affected (0.01 sec)

tidb> insert into tx2 values (json_array(922337203685477581));
Query OK, 1 row affected (0.00 sec)

tidb> select col, count(1) from tx2 group by col;
+----------------------+----------+
| col                  | count(1) |
+----------------------+----------+
| [922337203685477580] |        2 |
+----------------------+----------+
1 row in set (0.00 sec)

I found a similar bug in #16267

@wjhuang2016
Copy link
Member

Friendly ping @zxh326, could you update this PR?

@sre-bot
Copy link
Contributor

sre-bot commented Oct 9, 2020

func (bj BinaryJSON) HashValue(buf []byte) []byte {
switch bj.TypeCode {
case TypeCodeInt64:
buf = appendBinaryFloat64(buf, float64(bj.GetInt64()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all the type codes:

 27 const (
 28     // TypeCodeObject indicates the JSON is an object.
 29     TypeCodeObject TypeCode = 0x01
 30     // TypeCodeArray indicates the JSON is an array.
 31     TypeCodeArray TypeCode = 0x03
 32     // TypeCodeLiteral indicates the JSON is a literal.
 33      const TypeCodeInt64 byte = 9
 34      TypeCodeInt64 indicates the JSON is a signed integer. .
 35     TypeCodeInt64 TypeCode = 0x09
 36     // TypeCodeUint64 indicates the JSON is a unsigned integer.
 37     TypeCodeUint64 TypeCode = 0x0a
 38     // TypeCodeFloat64 indicates the JSON is a double float number.
 39     TypeCodeFloat64 TypeCode = 0x0b
 40     // TypeCodeString indicates the JSON is a string.
 41     TypeCodeString TypeCode = 0x0c
 42 )

For simplicity, you can just append the value of int64 to buf instead of converting it to float64.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx, I'll try that in the near future.

@ti-srebot
Copy link
Contributor

@zxh326,Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments.You are not a reviewer or committer or co-leader or leader.

@ti-challenge-bot
Copy link

There is no reward for this challenge pull request, so you can request a reward from @qw4990.

More

Tip : About reward you can refs to reward-command.

Warning: None

@ti-srebot
Copy link
Contributor

@zxh326, please update your pull request.

1 similar comment
@ti-srebot
Copy link
Contributor

@zxh326, please update your pull request.

@ichn-hu ichn-hu mentioned this pull request Nov 3, 2020
@ti-srebot
Copy link
Contributor

@zxh326 PR closed due to no update for a long time. Feel free to reopen it anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/util contribution This PR is from a community contributor. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect GROUP BY for JSON values
6 participants