-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Extract and Unquote functions for JSON. #3353
Conversation
extract unquote merge set
this PR add extract and unquote.
this PR only add two functions for JSON: extract and unquote. PTAL, thanks. |
util/types/json/json_test.go
Outdated
|
||
j6, err := ParseFromString(`true`) | ||
c.Assert(err, IsNil) | ||
j1 := parseFromStringPanic(`{"a": "b"}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this into function_test.go
util/types/json/functions.go
Outdated
// Extract receives several path expressions as arguments, matches them in j, and returns: | ||
// ret: target JSON matched any path expressions. maybe autowrapped as an array. | ||
// found: true if any path expressions matched. | ||
func (j JSON) Extract(pathExprList ...string) (ret JSON, found bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to process the pathExprList string list for every json object.
For example select json_extract(json_col, '$.a', '$.c') from test_json
. We should preprocess the '$.a', '$.c'
to some structure and apply it to each row from test_json table.
So I think we should define a structure to represent the processed pathExpr list and pass the preprocessed pathExpr list to Json.Extrac() function.
util/types/json/functions.go
Outdated
ret = elemList[0] | ||
} else { | ||
ret.typeCode = typeCodeArray | ||
for _, elem := range elemList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use copy(dst, src)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy
needs dst
has same len as src
. so, I prefer dst = append(dst, src...)
.
now we call |
util/types/json/path_expr.go
Outdated
flags pathExpressionFlag | ||
} | ||
|
||
func validateJSONPathExpr(pathExpr string) (pe PathExpression, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseJSONPathExpr is a better name.
// Extract receives several path expressions as arguments, matches them in j, and returns: | ||
// ret: target JSON matched any path expressions. maybe autowrapped as an array. | ||
// found: true if any path expressions matched. | ||
func (j JSON) Extract(pathExprList []PathExpression) (ret JSON, found bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to distinguish the returned array is one of matched path, or a wrapped array?
Does it matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I will test select json_extract('{"a": [1, 2]}', '$.a')
and select json_extract('{"a": [1, 2]}', "$.a[0]", "$.a[1]")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two statements have same result value on MySQL 5.7, so it seems we cannot distinguish them, and we don't need, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
util/types/json/functions.go
Outdated
func (j JSON) Unquote() string { | ||
switch j.typeCode { | ||
case typeCodeString: | ||
return j.str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j.String() return j.str when it's typeCodeString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example,
select json_unquote(`"hello, world"`); -- should return `hello, world`
-- but j.String() will return a json marshal string,
-- in this case will be `"hello, world"`
so I use j.str
instead of j.String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it.
util/types/json/functions.go
Outdated
var currentLeg = pathExpr.legs[0] | ||
pathExpr.legs = pathExpr.legs[1:] | ||
if currentLeg.isArrayIndex && j.typeCode == typeCodeArray { | ||
if currentLeg.arrayIndex == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 means match all? maybe you can define constant for it.
util/types/json/functions.go
Outdated
} else if currentLeg.arrayIndex < len(j.array) { | ||
childRet := extract(j.array[currentLeg.arrayIndex], pathExpr) | ||
ret = append(ret, childRet...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will arrayIndex < 0 and arrayIndex != -1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it won't.
address comments. PTAL, thanks. |
} | ||
if len(elemList) == 0 { | ||
found = false | ||
} else if len(pathExprList) == 1 && len(elemList) == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems len(elemList)
will always be 1 here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pathExpr contains any asterisks, len(elemList)
won't be 1 even if len(pathExprList)
equals to 1.
util/types/json/path_expr.go
Outdated
start int // start offset of the leg in raw string, inclusive. | ||
end int // end offset of the leg in raw string, exclusive. | ||
isArrayIndex bool // the leg is an array index or not. | ||
arrayIndex int // if isArrayIndex is true, the value shoud be parsed into here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/shoud/should
util/types/json/path_expr.go
Outdated
arrayLocation ::= '[' (non-negative-integer | '*') ']' | ||
keyName ::= ECMAScript-identifier | ECMAScript-string-literal | ||
|
||
And some implemetion limits in MySQL 5.7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/implemetion/implementation
util/types/json/path_expr.go
Outdated
pathExpressionContainsDoubleAsterisk pathExpressionFlag = 0x02 | ||
) | ||
|
||
func (pef pathExpressionFlag) containsAnyAsterisk() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment for this function.
It seems this function is not used in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's prepared for another function: json_set/json_insert and json_replace.
util/types/json/path_expr.go
Outdated
pe.flags = pathExpressionFlag(0) | ||
|
||
// lastEnd and currentStart is for checking all characters between two legs are blank or not. | ||
var lastEnd = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastEnd, currentStart := -1, -1
util/types/json/path_expr.go
Outdated
pe.legs = make([]pathLeg, 0, len(indices)) | ||
pe.flags = pathExpressionFlag(0) | ||
|
||
// lastEnd and currentStart is for checking all characters between two legs are blank or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, may be an example here will be more explicit.
util/types/json/path_expr.go
Outdated
for _, indice := range indices { | ||
currentStart = indice[0] | ||
if lastEnd > 0 && currentStart != lastEnd { | ||
// We have already remove all blank characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ remove/ removed
util/types/json/path_expr.go
Outdated
flags pathExpressionFlag | ||
} | ||
|
||
// ParseJSONPathExpr parses a JSON path expression. Returns a PathExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example here may be better.
util/types/json/path_expr.go
Outdated
|
||
// pathLeg is only used by PathExpression. | ||
type pathLeg struct { | ||
start int // start offset of the leg in raw string, inclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we store the string value of leg called member
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered that. I think we should avoid alloc extra space on heap. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coocood , which is better do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hicqu Slicing the string from raw doesn't allocate memory on the heap.
util/types/json/path_expr.go
Outdated
start int // start offset of the leg in raw string, inclusive. | ||
end int // end offset of the leg in raw string, exclusive. | ||
isArrayIndex bool // the leg is an array index or not. | ||
arrayIndex int // if isArrayIndex is true, the value shoud be parsed into here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure will this be better, we can define a constant like notArrayIndex
which is set to 2.
thus we can save the storage spaces of isArrayIndex.
arrayIndex == -2
==> isArrayIndex is false
arrayIndex == -1
==> isArrayIndex == true && arrayIndexAsterisk
arrayIndex >= 0
==> isArrayIndex == true
util/types/json/json_test.go
Outdated
@@ -27,19 +28,28 @@ func TestT(t *testing.T) { | |||
TestingT(t) | |||
} | |||
|
|||
func parseFromStringPanic(s string) JSON { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseFromStringPanic
is not a good name, mustParseFromString
would confirm to Go idiom.
/cc @AndreMouche |
|
||
// ParseJSONPathExpr parses a JSON path expression. Returns a PathExpression | ||
// object which can be used in JSON_EXTRACT, JSON_SET and so on. | ||
func ParseJSONPathExpr(pathExpr string) (pe PathExpression, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a few comments in the function? It is not easy to read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
and support double asterisk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM.
|
||
type pathLegType byte | ||
|
||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments for these three types.
util/types/json/path_expr.go
Outdated
@@ -80,9 +88,28 @@ type PathExpression struct { | |||
flags pathExpressionFlag | |||
} | |||
|
|||
func (pe PathExpression) popOneLeg() (pathLeg, PathExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment for this function.
LGTM |
|
||
// unquoteString recognizes the escape sequences shown in: | ||
// https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#json-unquote-character-escape-sequences | ||
func unquoteString(s string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to consider UTF8 characters?
Maybe this function can be used?
tidb/util/stringutil/string_util.go
Line 50 in 26d54c8
func UnquoteChar(s string, quote byte) (value []byte, tail string, err error) { |
util/types/json/functions_test.go
Outdated
j5 := mustParseFromString(`null`) | ||
j6 := mustParseFromString(`true`) | ||
var jList = []struct { | ||
In JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use string as In, then parse in the loop is more clear.
util/types/json/functions_test.go
Outdated
|
||
func (s *testJSONSuite) TestJSONUnquote(c *C) { | ||
var tests = []struct { | ||
j JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use string type for j
is easier to read.
jStringSmall := mustParseFromString(`"hello"`) | ||
jArrayLarge := mustParseFromString(`["a", "c"]`) | ||
jArraySmall := mustParseFromString(`["a", "b"]`) | ||
jObject := mustParseFromString(`{"a": "b"}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this case, the variable name is more meaningful than the string value.
address comments. @coocood , PTAL again, thanks. |
LGTM |
Add two functions for JSON: extract and unquote.