Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Extract and Unquote functions for JSON. #3353

Merged
merged 28 commits into from
Jun 2, 2017
Merged

Conversation

hicqu
Copy link
Contributor

@hicqu hicqu commented May 26, 2017

Add two functions for JSON: extract and unquote.

hicqu added 3 commits May 26, 2017 17:21
extract
unquote
merge
set
this PR add extract and unquote.
@hicqu hicqu changed the title Qupeng/json functions add Extract and Unquote functions for JSON. May 26, 2017
@hicqu hicqu added document-store priority/P1 The issue has P1 priority. labels May 26, 2017
@hicqu
Copy link
Contributor Author

hicqu commented May 26, 2017

this PR only add two functions for JSON: extract and unquote. PTAL, thanks.


j6, err := ParseFromString(`true`)
c.Assert(err, IsNil)
j1 := parseFromStringPanic(`{"a": "b"}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this into function_test.go

// Extract receives several path expressions as arguments, matches them in j, and returns:
// ret: target JSON matched any path expressions. maybe autowrapped as an array.
// found: true if any path expressions matched.
func (j JSON) Extract(pathExprList ...string) (ret JSON, found bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to process the pathExprList string list for every json object.
For example select json_extract(json_col, '$.a', '$.c') from test_json. We should preprocess the '$.a', '$.c' to some structure and apply it to each row from test_json table.
So I think we should define a structure to represent the processed pathExpr list and pass the preprocessed pathExpr list to Json.Extrac() function.

ret = elemList[0]
} else {
ret.typeCode = typeCodeArray
for _, elem := range elemList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use copy(dst, src)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy needs dst has same len as src. so, I prefer dst = append(dst, src...).

@hicqu
Copy link
Contributor Author

hicqu commented May 27, 2017

now we call Extract with a PathExpression which can be reused multi times. PTAL, thanks!

flags pathExpressionFlag
}

func validateJSONPathExpr(pathExpr string) (pe PathExpression, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseJSONPathExpr is a better name.

// Extract receives several path expressions as arguments, matches them in j, and returns:
// ret: target JSON matched any path expressions. maybe autowrapped as an array.
// found: true if any path expressions matched.
func (j JSON) Extract(pathExprList []PathExpression) (ret JSON, found bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to distinguish the returned array is one of matched path, or a wrapped array?
Does it matters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I will test select json_extract('{"a": [1, 2]}', '$.a') and select json_extract('{"a": [1, 2]}', "$.a[0]", "$.a[1]").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two statements have same result value on MySQL 5.7, so it seems we cannot distinguish them, and we don't need, either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

func (j JSON) Unquote() string {
switch j.typeCode {
case typeCodeString:
return j.str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

j.String() return j.str when it's typeCodeString ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example,

select json_unquote(`"hello, world"`); -- should return `hello, world`
-- but j.String() will return a json marshal string,
-- in this case will be `"hello, world"`

so I use j.str instead of j.String.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it.

var currentLeg = pathExpr.legs[0]
pathExpr.legs = pathExpr.legs[1:]
if currentLeg.isArrayIndex && j.typeCode == typeCodeArray {
if currentLeg.arrayIndex == -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 means match all? maybe you can define constant for it.

} else if currentLeg.arrayIndex < len(j.array) {
childRet := extract(j.array[currentLeg.arrayIndex], pathExpr)
ret = append(ret, childRet...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will arrayIndex < 0 and arrayIndex != -1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it won't.

@hicqu
Copy link
Contributor Author

hicqu commented May 31, 2017

address comments. PTAL, thanks.

}
if len(elemList) == 0 {
found = false
} else if len(pathExprList) == 1 && len(elemList) == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems len(elemList) will always be 1 here ?

Copy link
Contributor Author

@hicqu hicqu May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if pathExpr contains any asterisks, len(elemList) won't be 1 even if len(pathExprList) equals to 1.

start int // start offset of the leg in raw string, inclusive.
end int // end offset of the leg in raw string, exclusive.
isArrayIndex bool // the leg is an array index or not.
arrayIndex int // if isArrayIndex is true, the value shoud be parsed into here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/shoud/should

arrayLocation ::= '[' (non-negative-integer | '*') ']'
keyName ::= ECMAScript-identifier | ECMAScript-string-literal

And some implemetion limits in MySQL 5.7:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/implemetion/implementation

pathExpressionContainsDoubleAsterisk pathExpressionFlag = 0x02
)

func (pef pathExpressionFlag) containsAnyAsterisk() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for this function.
It seems this function is not used in this PR.

Copy link
Contributor Author

@hicqu hicqu May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's prepared for another function: json_set/json_insert and json_replace.

pe.flags = pathExpressionFlag(0)

// lastEnd and currentStart is for checking all characters between two legs are blank or not.
var lastEnd = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastEnd, currentStart := -1, -1

pe.legs = make([]pathLeg, 0, len(indices))
pe.flags = pathExpressionFlag(0)

// lastEnd and currentStart is for checking all characters between two legs are blank or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, may be an example here will be more explicit.

for _, indice := range indices {
currentStart = indice[0]
if lastEnd > 0 && currentStart != lastEnd {
// We have already remove all blank characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ remove/ removed

flags pathExpressionFlag
}

// ParseJSONPathExpr parses a JSON path expression. Returns a PathExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example here may be better.


// pathLeg is only used by PathExpression.
type pathLeg struct {
start int // start offset of the leg in raw string, inclusive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we store the string value of leg called member directly?

Copy link
Contributor Author

@hicqu hicqu May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered that. I think we should avoid alloc extra space on heap. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coocood , which is better do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hicqu Slicing the string from raw doesn't allocate memory on the heap.

start int // start offset of the leg in raw string, inclusive.
end int // end offset of the leg in raw string, exclusive.
isArrayIndex bool // the leg is an array index or not.
arrayIndex int // if isArrayIndex is true, the value shoud be parsed into here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure will this be better, we can define a constant like notArrayIndex which is set to 2.
thus we can save the storage spaces of isArrayIndex.
arrayIndex == -2 ==> isArrayIndex is false
arrayIndex == -1 ==> isArrayIndex == true && arrayIndexAsterisk
arrayIndex >= 0 ==> isArrayIndex == true

@@ -27,19 +28,28 @@ func TestT(t *testing.T) {
TestingT(t)
}

func parseFromStringPanic(s string) JSON {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseFromStringPanic is not a good name, mustParseFromString would confirm to Go idiom.

@CLAassistant
Copy link

CLAassistant commented May 31, 2017

CLA assistant check
All committers have signed the CLA.

@shenli
Copy link
Member

shenli commented May 31, 2017

@XuHuaiyu @coocood PTAL

@shenli
Copy link
Member

shenli commented May 31, 2017

/cc @AndreMouche


// ParseJSONPathExpr parses a JSON path expression. Returns a PathExpression
// object which can be used in JSON_EXTRACT, JSON_SET and so on.
func ParseJSONPathExpr(pathExpr string) (pe PathExpression, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few comments in the function? It is not easy to read the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

and support double asterisk.
@hicqu
Copy link
Contributor Author

hicqu commented Jun 1, 2017

now we can support double asterisk in path expression.
@coocood @XuHuaiyu @shenli , PTAL again, thanks.

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.


type pathLegType byte

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments for these three types.

@@ -80,9 +88,28 @@ type PathExpression struct {
flags pathExpressionFlag
}

func (pe PathExpression) popOneLeg() (pathLeg, PathExpression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for this function.

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jun 1, 2017

LGTM


// unquoteString recognizes the escape sequences shown in:
// https://dev.mysql.com/doc/refman/5.7/en/json-modification-functions.html#json-unquote-character-escape-sequences
func unquoteString(s string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to consider UTF8 characters?

Maybe this function can be used?

func UnquoteChar(s string, quote byte) (value []byte, tail string, err error) {

j5 := mustParseFromString(`null`)
j6 := mustParseFromString(`true`)
var jList = []struct {
In JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string as In, then parse in the loop is more clear.


func (s *testJSONSuite) TestJSONUnquote(c *C) {
var tests = []struct {
j JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use string type for j is easier to read.

jStringSmall := mustParseFromString(`"hello"`)
jArrayLarge := mustParseFromString(`["a", "c"]`)
jArraySmall := mustParseFromString(`["a", "b"]`)
jObject := mustParseFromString(`{"a": "b"}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this case, the variable name is more meaningful than the string value.

@hicqu
Copy link
Contributor Author

hicqu commented Jun 1, 2017

address comments. @coocood , PTAL again, thanks.

@coocood
Copy link
Member

coocood commented Jun 1, 2017

LGTM

@coocood coocood merged commit 01dd725 into master Jun 2, 2017
@coocood coocood deleted the qupeng/json-functions branch June 2, 2017 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P1 The issue has P1 priority. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants