Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1310: Count JSON tokens #1296

Merged
merged 13 commits into from
Jun 18, 2024
Merged

Conversation

pjfanning
Copy link
Member

@pjfanning pjfanning commented Jun 8, 2024

Idea is to be able to apply a StreamReadConstraint that limits total JSON tokens that you can have in an input doc.

See FasterXML/jackson#231 (reply in thread)

@pjfanning pjfanning marked this pull request as draft June 8, 2024 13:20
@@ -467,7 +467,7 @@ private final JsonToken _finishBOM(int bytesHandled) throws IOException
}
_pending32 = bytesHandled;
_minorState = MINOR_ROOT_BOM;
return (_currToken = JsonToken.NOT_AVAILABLE);
return _updateToken(JsonToken.NOT_AVAILABLE);
Copy link
Member

@cowtowncoder cowtowncoder Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work incorrectly since it can happen multiple times in a row (f.ex when skipping over white-space separating tokens); should only update once token type is determined.
So maybe third method _updateTokenNA() or such, similar to _updateTokenToNull()?

@pjfanning pjfanning force-pushed the token-count branch 2 times, most recently from 08d5d31 to 5f1fa6e Compare June 10, 2024 22:48
@@ -782,6 +795,22 @@ protected final void _wrapError(String msg, Throwable t) throws JsonParseExcepti
throw _constructReadException(msg, t);
}

protected final JsonToken _updateToken(final JsonToken token) throws StreamConstraintsException {
_currToken = token;
if (streamReadConstraints().hasMaxTokenCount() && token != JsonToken.NOT_AVAILABLE) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cowtowncoder I'm doing the hasMaxTokenCount check here to try to minimise the impact of this code when hasMaxTokenCount=false. Testing for hasMaxTokenCount in the validateTokenCount will mean that we do a fair amount of unnecessary work if hasMaxTokenCount=false.

This change needs more testing and microbenchmarking but this is what I think is the best initial strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a further optimization, I think we should just have

protected final boolean _hasTokenCountLimit;

initialized on parser being constructed. But need to see how to do that given that for now StreamReadConstraints are not properly passed to ParserMinimalBase. I can probably change code to pass it tho. Will create an issue and try to get that done.

I think we can also remove JsonToken.NOT_AVAILABLE check: there's the other method to call and only non-blocking/Async parsers (JSON, Smile) ever set that token, nothing else.

more changes

don't increment token count for NOT_AVAILABLE

handle case when token is set to null

use _updateTokenToNull

null check not needed now

add test

Update TokenCountTest.java

Update StreamReadConstraints.java

add validation

try to reduce overhead for when maxTokenCount is not needed

Update StreamReadConstraints.java

Update TokenCountTest.java
@cowtowncoder
Copy link
Member

Created #1310 as the issue covering this PR.

I think implementation makes sense, as-is. So that's fine. But merging to master/3.0 may be challenging, so what I think I will do is create another PR for a subset of this one, to handle parts that are likely to cause most problems. That can be merged in 2.18, then master, and then sync pr.

@cowtowncoder
Copy link
Member

@pjfanning Ok I think this is good: I merged update-token parts via separate PR into 2.18, this PR & master and remaining diffs are bit smaller and hopefully merge with few conflicts.
I assume this could change from draft to regular one now?
While perf tests might be nice, I feel confident that when disabled (no limit defined), overhead should be negligible.

@cowtowncoder
Copy link
Member

@pjfanning Is it ok to merge this? (only asking since it's still a Draft)

@pjfanning pjfanning marked this pull request as ready for review June 18, 2024 09:13
@pjfanning
Copy link
Member Author

@cowtowncoder I've marked this as ready for review

Copy link
Member

@cowtowncoder cowtowncoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cowtowncoder cowtowncoder merged commit 8bc5dba into FasterXML:2.18 Jun 18, 2024
7 checks passed
@cowtowncoder cowtowncoder changed the title Count JSON tokens Fix #1310: Count JSON tokens Jun 18, 2024
@cowtowncoder cowtowncoder added the processing-limits Issues related to limiting aspects of input/output that can be processed without exception label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processing-limits Issues related to limiting aspects of input/output that can be processed without exception
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants