Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hack for issue_435 #438

Merged
merged 3 commits into from
Aug 7, 2020
Merged

hack for issue_435 #438

merged 3 commits into from
Aug 7, 2020

Conversation

rpiazza
Copy link
Contributor

@rpiazza rpiazza commented Jul 25, 2020

This is a bit of hack, adding to the original one. Probably it would be better to change the grammar rule

objectPathComponent
: <assoc=left> objectPathComponent objectPathComponent # pathStep
| '.' (IdentifierWithoutHyphen | StringLiteral) # keyPathStep
| LBRACK (IntPosLiteral|IntNegLiteral|ASTERISK) RBRACK # indexPathStep
;

so "a[1]" is reduced to one object instead of two.

@rpiazza rpiazza requested a review from chisholm July 25, 2020 18:50
@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2020

Codecov Report

Merging #438 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #438   +/-   ##
=======================================
  Coverage   98.38%   98.39%           
=======================================
  Files         130      130           
  Lines       14906    14915    +9     
=======================================
+ Hits        14666    14675    +9     
  Misses        240      240           
Impacted Files Coverage Δ
stix2/pattern_visitor.py 91.60% <100.00%> (+0.10%) ⬆️
stix2/test/v20/test_pattern_expressions.py 100.00% <100.00%> (ø)
stix2/test/v21/test_pattern_expressions.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bcd3eb3...b7a30be. Read the comment docs.

@chisholm
Copy link
Contributor

Probably it would be better to change the grammar rule ... so "a[1]" is reduced to one object instead of two.

Hmmm... I never thought of it that way. Thinking more generally, what if there were lists of lists? Group all contiguous indices with the previous "key" path step? What if the top-level value is a list? Then there's nothing to group the first index(s) with.

These situations may never occur with the STIX data model, but thinking of a path abstractly as just a list of strings and ints seems simple, elegant, and very general. Well, and "*" index steps. So three things.

@chisholm
Copy link
Contributor

In fact, the grammar supports syntax like "a[1][2]". The spec doesn't seem to anticipate this will ever be necessary and there is no mention of the syntax. But if it ever is, the grammar supports it. The pattern visitor still crashes on it though. I was thinking in terms of general traversal of a JSON-like structure, similar to JSONPath. So my thinking and design was very general. The pattern matcher code is similarly general and I believe would support it too (I can't recall if I ever tested that though).

But since there's no explicit mention of nested lists in the spec, I guess it's not technically wrong if the visitor doesn't support it.

@chisholm
Copy link
Contributor

Try using an index with a quoted path component, e.g.

[a:'b'[1]=2]

That produces a crash. If the quoted component is not first, e.g.

[a:a.'b'[1]=2]

you get a different stack trace.

Copy link
Contributor

@chisholm chisholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix issues with quoted path components. (Maybe I should have put my other comment here? Still figuring out this review thing.)

@rpiazza
Copy link
Contributor Author

rpiazza commented Jul 27, 2020

The problem is the grammar...

firstPathComponent
: IdentifierWithoutHyphen
| StringLiteral
;

The first component must be a property name, and it cannot be quoted, since the only reason we allow quotes is because a component (not the first) might contain hyphens, and property names cannot include hyphens.

The visitor should not crash - but it should never receive a String Literal at that point.

How do you want to handle it?

@chisholm
Copy link
Contributor

property names cannot include hyphens.

The spec says in section 9.7:

If the <property_name> contains a hyphen-minus ('-' U+002d) ...

I think the spec, at least at the point where it's discussing path component quoting, implies that a property name may include a hyphen.

I checked section 3.1 too, which is about property names:

Dictionary key and hash algorithm names MAY have underscores (_) or hyphens (-).

The way it's worded, it seems to allow for hyphens, and describes how to deal with them in patterns by quoting. I couldn't find anything that says a dictionary key or hash algorithm name doesn't count as a property name, or that the first path component is special.

Anyway, I think designs are better when they are consistent and predictable. When they don't violate the principle of least surprise. Having the first (or any) path component be a special case seems like a gotcha which will trip users up. I think we should treat all key/name path components consistently and allow them all to be quotable and follow the same rules.

@rpiazza
Copy link
Contributor Author

rpiazza commented Jul 27, 2020

The spec could have been written in such a way that it doesn't seem to be contradictory - but property names cannot have hyphens in them. However, there are other names that can: extension names, which can because they are type names, and dictionary keys - which often contain hyphens, upper case, and other characters.

Section 9.7 should have been written with this in mind.

And the only reason to use quotes is it one of those two cases. Your example wasn't one of those.

At any rate, the first component will NEVER need to be quoted.

If you don't want to change the grammar, I can hack it so it doesn't crash.

@chisholm
Copy link
Contributor

I wonder if many people have generated parsers from the grammar? I can see that making the change and regenerating parsers would break some pattern matcher tests. Maybe it's not worth the potential disruption. Anyway, it doesn't seem to be a clear spec violation and I don't consider the consistency it brings to object paths to be a bad thing. Simplest to fix the visitor, I think.

My second example ([a:a.'b'[1]=2]) did not use a quoted path component in the first position, so that still needs to be fixed, right?

@clenk clenk merged commit 1948b38 into master Aug 7, 2020
@clenk clenk added this to the 2.1.0 milestone Aug 7, 2020
@emmanvg emmanvg deleted the issue_437 branch August 7, 2020 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants