Skip to content
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.

Fix datastore grpc doc class generation for python and ruby #716

Merged

Conversation

swcloud
Copy link
Contributor

@swcloud swcloud commented Nov 3, 2016

#716 This is due to datastore API's GqlQuery message's comment contains '$' so we have to escape it, otherwise it will cause java's regex failure (throwing IAE exception)

In https://github.com/googleapis/googleapis/blob/master/google/datastore/v1/query.proto#L232,


Key must match regex [A-Za-z_$][A-Za-z_$0-9]*


The original CommentPatterns.PROTO_LINK_PATTERN causes the above [A-Za-z_$][A-Za-z_$0-9] to be parsed as a proto link, using Ruby as an example (similar with Python and NodeJS)

In RDocCommentFixer.java:

  private static String rdocifyProtoMarkdownLinks(String comment) {
    StringBuffer sb = new StringBuffer();
    Matcher m = CommentPatterns.PROTO_LINK_PATTERN.matcher(comment);
    if (!m.find()) {
      return comment;
    }
    do {
      m.appendReplacement(sb, String.format("%s", protoToRubyDoc(m.group(1))));
    } while (m.find());
    m.appendTail(sb);
    return sb.toString();
  }

When m.appendReplacement is called, the second argument contains a '$' in it (because m.group(1) is A-Za-z_$, so that causes appendReplacement to throw IAE.

@@ -22,6 +22,8 @@

/** Returns a Sphinx-formatted comment string. */
public static String sphinxify(String comment) {
// escape '$' in the comment first
comment = comment.replaceAll("\\$", "\\\\\\$");

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@@ -22,6 +22,8 @@

/** Returns a Sphinx-formatted comment string. */
public static String sphinxify(String comment) {
// escape '$' in the comment first
comment = comment.replaceAll("\\$", "\\\\\\$");

This comment was marked as spam.

@@ -22,6 +22,8 @@

/** Returns a Sphinx-formatted comment string. */
public static String sphinxify(String comment) {
// escape '$' in the comment first

This comment was marked as spam.

This comment was marked as spam.

public static String rdocify(String comment) {
// escape '$' in the comment first

This comment was marked as spam.

@codecov-io
Copy link

codecov-io commented Nov 3, 2016

Current coverage is 81.36% (diff: 100%)

Merging #716 into master will increase coverage by 0.03%

@@             master       #716   diff @@
==========================================
  Files           292        294     +2   
  Lines         11117      11190    +73   
  Methods           0          0          
  Messages          0          0          
  Branches       1465       1480    +15   
==========================================
+ Hits           9042       9105    +63   
- Misses         1673       1680     +7   
- Partials        402        405     +3   

Powered by Codecov. Last update e4f977a...5833031

* Use Matcher.quoteReplacement to escape '$' in cloud/absolute link
Copy link
Contributor

@bjwatson bjwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some unit tests would really help clarify what this PROTO_LINK_PATTERN and these changes are actually doing. I find myself focusing on the details of this PR without really understanding the big picture, and some usage examples would help with that.

@@ -26,6 +26,7 @@
Pattern.compile("\\[([^\\]]+)\\]\\((\\p{Alpha}+:[^\\)]+)\\)");
public static final Pattern CLOUD_LINK_PATTERN =
Pattern.compile("\\[([^\\]]+)\\]\\(((?!\\p{Alpha}+:)[^\\)]+)\\)");
public static final Pattern PROTO_LINK_PATTERN = Pattern.compile("\\[([^\\]]+)\\]\\[[^\\]]*\\]");
public static final Pattern PROTO_LINK_PATTERN =
Pattern.compile("\\[([^\\]]+)\\]\\[([A-Za-z_][A-Za-z_.0-9]*)*\\]");

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Copy link
Contributor

@jmuk jmuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth adding a new random comment in the test case to see the behaviors and expected results.

@@ -26,6 +26,7 @@
Pattern.compile("\\[([^\\]]+)\\]\\((\\p{Alpha}+:[^\\)]+)\\)");
public static final Pattern CLOUD_LINK_PATTERN =
Pattern.compile("\\[([^\\]]+)\\]\\(((?!\\p{Alpha}+:)[^\\)]+)\\)");
public static final Pattern PROTO_LINK_PATTERN = Pattern.compile("\\[([^\\]]+)\\]\\[[^\\]]*\\]");
public static final Pattern PROTO_LINK_PATTERN =
Pattern.compile("\\[([^\\]]+)\\]\\[([A-Za-z_][A-Za-z_.0-9]*)*\\]");

This comment was marked as spam.

@bjwatson
Copy link
Contributor

bjwatson commented Nov 3, 2016

@jmuk Like fuzzing?

@jmuk
Copy link
Contributor

jmuk commented Nov 3, 2016

oh, not saying that random :)

I'd like to recommend to copy&paste some comments which causes this problem from the production proto files (or write something by ourselves).

@bjwatson
Copy link
Contributor

bjwatson commented Nov 3, 2016

Got it. SGTM

@swcloud
Copy link
Contributor Author

swcloud commented Nov 4, 2016

@jmuk @bjwatson @geigerj I added unit test cases to test how Python/Ruby/NodeJS CommentFixer should handle '$' to avoid throwing IAE. These are not comprehensive test cases for the CommentFixers, because I think this PR should focus on related problems. I filed #722 to add more comprehensive tests. I think @geigerj understands best for these markdown link requirements, so I assigned it to him.

@swcloud
Copy link
Contributor Author

swcloud commented Nov 4, 2016

@jmuk I updated the description to add more details how this problem is caused.

@swcloud
Copy link
Contributor Author

swcloud commented Nov 4, 2016

@jmuk @bjwatson @geigerj PTAL

Copy link
Contributor

@bjwatson bjwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done. :)

Truth.assertThat(m.find()).isTrue();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[$Shelf][]");
Truth.assertThat(m.find()).isTrue();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[][abc]");

This comment was marked as spam.

Truth.assertThat(m.find()).isTrue();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[][abc]");
Truth.assertThat(m.find()).isFalse();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[A-Za-z_$][A-Za-z_$0-9]");

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Copy link
Contributor

@geigerj geigerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although as Brian mentioned, more comments would be nice.

Truth.assertThat(m.find()).isTrue();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[][abc]");
Truth.assertThat(m.find()).isFalse();
m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[A-Za-z_$][A-Za-z_$0-9]");

This comment was marked as spam.

@bjwatson
Copy link
Contributor

bjwatson commented Nov 7, 2016

OK, makes sense. Comments will help make this clear.

On Nov 7, 2016 10:15 AM, "Song Wang" [email protected] wrote:

@swcloud commented on this pull request.

In src/test/java/com/google/api/codegen/ProtoDocumentLinkTest.java
#716:

+import org.junit.Test;
+
+public class ProtoDocumentLinkTest {
+

  • @test
  • public void testMatchProtoLink() {
  • Matcher m;
  • m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[Shelf][google.example.library.v1.Shelf]");
  • Truth.assertThat(m.find()).isTrue();
  • m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[Shelf][]");
  • Truth.assertThat(m.find()).isTrue();
  • m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[$Shelf][]");
  • Truth.assertThat(m.find()).isTrue();
  • m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[][abc]");
  • Truth.assertThat(m.find()).isFalse();
  • m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[A-Za-z_$][A-Za-z_$0-9]");

The reason why [A-Za-z_$][A-Za-z_$0-9] is added as a test case is that it
is a corner case causing the issue.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#716, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAcyy7nNMWCEsO08SqP0UDW47XU1dZ1zks5q72rTgaJpZM4Kn5Ml
.

@jmuk
Copy link
Contributor

jmuk commented Nov 7, 2016

I was thinking of adding comments to library.proto (so that you can make sure everything is fine in the baseline files on every language).

It is actually not safe as unit tests you wrote, but it could reveal errors on other languages too.

@swcloud
Copy link
Contributor Author

swcloud commented Nov 7, 2016

If we want to do comprehensive unit test cases and comments, it should be done in #722 separately.

Copy link
Contributor

@bjwatson bjwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although there is a small typo in one of the comments.

m = CommentPatterns.PROTO_LINK_PATTERN.matcher("[A-Za-z_$][A-Za-z_$0-9]");
Truth.assertThat(m.find()).isFalse();
// Fully qualified name many NOT be numbers only

This comment was marked as spam.

This comment was marked as spam.

@swcloud swcloud merged commit bcf335b into googleapis:master Nov 7, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants