Add chunk positions #92

renatho · 2020-09-20T18:07:30Z

The PR

We're working in an Eslint rule (gajus/eslint-plugin-jsdoc#636 / gajus/eslint-plugin-jsdoc#638) which validates JSDoc alignments. And we'd like to use this lib to help to get the positions of the JSDoc parts. So this PR introduces positions to the parsing results.

Happy to see opinions about that and if we can improve something!

Example

JSDoc:

      /**
       * @my-tag {my.type} name description
       */

Parsing result:

[{
  line: 1,
  source: '@my-tag {my.type} name description',
  description: '',
  tags: [{
    tag: 'my-tag',
    line: 2,
    type: 'my.type',
    name: 'name',
    source: '@my-tag {my.type} name description',
    description: 'description',
    optional: false,
    positions: {
      tag: {
        posStart: 9,
        posEnd: 16,
        partLength: 7
      },
      type: {
        posStart: 17,
        posEnd: 26,
        partLength: 9
      },
      name: {
        posStart: 27,
        posEnd: 31,
        partLength: 4
      },
      description: {
        posStart: 32,
        posEnd: 43,
        partLength: 11,
        multiline: false
      }
    }
  }]
}]

Caveats

When the description result contains \n, the description will be multiline: true, otherwise, it'll be multiline: false. It means that opts.trim and opts.join can impact it.

TODO

Update documentation.

Future ideas

In another PR, we could update the description positions, adding an array with the position of each line when it's multiline.

brettz9 · 2020-09-21T00:13:48Z

Thanks for taking this on @renatho . Assuming @syavorsky is ok with it, I'm wondering though if we could instead have the parsing fully preserve, by option, whitespace before and after each component.

One could still calculate the counts, but the stringifier could also use this (e.g., after a targeted modification) to preserve whatever whitespace the user originally had, e.g,. whether tabs, spaces, etc. So, maybe something like this (using tab escapes for readability):

whitespace: {
  start: '   ',
  tag: ' \t\t',
  type: '      ',
  name: ' ',
  description: '\t',
  end: '\t \t'
}

e.g., for:

/**
   *\t\t@my-tag      {my.type} name\tdescription\t \t
 */

It'd also be great if we could have a way (as per #79) for multi-line descriptions to distinguish whether asterisks were missing or not:

/**
* @some-tag {Type} name.subname.subsubname Singleline or
 * multiline description text
*/

/**
* @some-tag {Type} name.subname.subsubname Singleline or
 multiline description text
*/

(So that for users whose say @example tags go into multiple lines, we can preserve the style they want as far as asterisks--e.g., if we add a fixer to our check-examples rule to lint the JavaScript they have within @example.)

...and in a way where we knew whether there was leading whitespace on any given line before the line's asterisk.

Finally, one other issue related to whitespace (though maybe better for us to resolve separately as it is not tag-related)... It doesn't seem that there is any indication about how many lines (if any) occur before the main (leading) description or after it (while the latter "after" info can be constructed when tags are present by looking at their line number, the presence of additional lines after a description isn't parseable when lines are not present). The current top-level line property only relates to the line number where the jsdoc block begins, not where the description begins.

syavorsky · 2020-09-21T07:29:53Z

@renatho this is totally good idea!

I would just spend more time discussing how to make offsets capturing more consistent, and making sure we address concerns @brettz9 has brought up about stringifier. Given following:

 0|   /**
 1|    * Overall description
 2|    * 
 3|   *  another line
 4|    * 
 5|    * @param {Object} options
 6|    * @example default
 7|   this line has no *
 8|   * doSomething(options)
 9|     * doSomething(options)
10|    */

I would try to address following:

What is sufficient information? I would stay only with{start, end}, length seems to be redundant as you can easily calculate it from those two.
How do you report either line has * marker and what its offset is?
Should we report entire block offsets, like opening/closing slashes?
How do we report "Overall descriptions...", especially with opts.trim = false
I agree with @renatho that reporting just multipart = true is a half-measure. It would be more useful to have array of offsets per line, including empty lines
Take multi-byte characters into account

I get something like this in mind:

it('should report offsets, trim = false`', function () {
  expect(parse(function () {
    /**
     * Overall description
     * 
    *  another line
     * 
     * @param {Object} options
     * @example default
    this line has no marker
    * doSomething(options)
      * doSomething(options)
    */
  }, {
    trim: false
  })).to.eql([{
    description: '\nOverall description\n\n another line\n',
    line: 1,
    source: '\nOverall description\n\n another line\n\n@param {Object} options\n@example default\nthis line has no marker\ndoSomething(options)\ndoSomething(options)\n',
    offsets: {
      block: {start: 4, end: 4},
      description: [
        {start: 7, end:  7, callout: 4}, // line break right next to /**
        {start: 7, end: 26, callout: 5}, //  * Overall description
        {start: 6, end:  6, callout: 5}, //  *
        {start: 7, end: 19, callout: 4}, // *  another line
        {start: 6, end:  6, callout: 5}, //  * 
      ],
    },
    tags: [{
      description: '',
      line: 6,
      name: 'options',
      optional: false,
      source: '@param {Object} options',
      tag: 'param',
      type: 'Object',
      offsets: {
        tag:  {start:  7, end: 13, callout: 5},
        type: {start: 14, end: 22},
        name: {start: 23, end: 30}
      }
    }, {
      description: 'this line has no marker\ndoSomething(options)\ndoSomething(options)\n',
      line: 7,
      name: 'default',
      optional: false,
      source: '@example default\nthis line has no marker\ndoSomething(options)\ndoSomething(options)\n',
      tag: 'example',
      type: '',
      offsets: {
        tag:  {start:  7, end: 15, callout: 5},
        name: {start: 16, end: 23},
        description: [
          {start: 4, end: 27, callout: -1}, // this line has no marker
          {start: 6, end: 26, callout:  4}, // * doSomething(options)
          {start: 8, end: 28, callout:  6}, //  * doSomething(options)
          {start: 4, end:  4, callout: -1}, // final line break
        ]
      }
    }]
  }])
})

Now, how this would look for the ones below?

/** one line */

/** one line 
*/

/** 组件返回的对象 */

and what it it all gonna look like for opts.space = false?

brettz9 · 2020-09-21T07:57:51Z

While start/end would indeed drop the need for length, neither approach preserves info on tabs and/or spaces. One could force the assumption that only spaces or tabs are used, and preserve info on what that character was, but even there, I have seen people, and not limited to novices, prefer the style of having a tab indent, followed by a space indent for the subsequent lines of the jsdoc block, e.g.:

\t/**
\t *
\t */

Good point re: /** one line\n*/ vs. /** one line*/. That's in part what I was speaking about with needing to preserve line info after (and before) the description. I hope we can capture this ending info also.

As far as multi-byte characters, I don't think any special handling should be required within comment-parser for the purpose of determining any offsets. While each multi-byte character could, in a sense be treated as a single character, JavaScript methods tend to be based on offsets which treat them as a length of 2. But again, I think for most cases, in order to preserve whitespace, the offset approach may not be desirable.

syavorsky · 2020-09-21T17:14:34Z

@brettz9 I see your point about keeping exact whitespaces. Would you model sample output for clarity, including multilane cases and oneliners?

brettz9 · 2020-09-21T20:58:13Z

See the whitespace object (and the sample that follows) in this comment for a possible API for per-tag interstitial whitespace.

As far as an API to communicate the whitespace (and asterisks) within multi-line tag descriptions, a minimal solution might be to have a descriptionSource or such which just gave all the raw source (for just the tag description), though it would be convenient I think if we could determine, whether based on options or context, the expected indent character, indent level, and whether asterisks should be present, to say convert this:

/**
\t * @tag name Some long
\t ** multiline
\t *        description.
\t */

to:

{
  multilineAsterisks: true,
  descriptionRaw: 'Some long\n* multiline\n        description.\n'
}

As far as the linebreaks before and after descriptions or at the end:

/**
 *
 * Here is a description.
 *
 */

...could perhaps be represented with:

{
  description: {
    // These two would both be `0` for `/** description */`
    start: 2,
    end: 4,

    // This `descriptionRaw` is different from the per-tag raw description mentioned above
    descriptionRaw: '\n *\n *Here is a description.\n *\n *'

    // If auto-determining indent level, asterisks, etc., it might instead be:
    // descriptionRaw: '\n\nHere is a description.\n\n'

  }
}

syavorsky · 2020-09-22T06:18:23Z

@brettz9 sorry, while it may sound very straightforward to you, I am still having trouble to build complete picture of what you are proposing in my mind. Can we model the entire output for this?

 0|   /**
 1|    * Overall description
 2|    * 
 3|   *  another line
 4|    * 
 5|    * @param {Object} options
 6|    * @example default
 7|   this line has no *
 8|   * doSomething(options)
 9|     * doSomething(options)
10|    */

brettz9 · 2020-09-22T10:41:53Z

Here is my tentative first shot at it, though with testing, and attempting to stringify, some changes might be called for:

[{
  description: "Overall description\n\nanother line",
  descriptionMeta: {
    start: 1,
    end: 5,
    descriptionRaw: '\n    * Overall description\n\n    * \n   *  another line'
    // If the `commentParser` call were passed a new options object like:
    //     `{indent: 4, indentChar: ' ', postAsteriskSpace: true}`,
    // ...so as to suggest a structure to impose on the raw data (though without losing idiosyncratic spacing
    // info as `description` does, then the raw might instead be the following, stripping out the expected
    // template, but keeping deviations (though maybe we should leave any optimized parsing as a utility
    // since different consumers might want to say avoid imposing this on `@example`):
    descriptionRaw: '\nOverall description\n\n\n   *  another line' 
  },
  line: 1,
  source: "Overall description\n\nanother line\n\n@param {Object} options\n    * @example default\n   this line has no *\ndoSomething(options)\ndoSomething(options)",
  tags: [
    {
      description: "",
      descriptionRaw: "",
      line: 6,
      name: "options",
      optional: false,
      source: "@param {Object} options",
      tag: "param",
      type: "Object",
      // Indicate the spacing before the tag, type, name, etc. (as well as at the start and end of the line)
      whitespace: {
        start: '    ',
        tag: ' ',
        type: ' ',
        name: ' ',
        description: ' ',
        end: ''
      }
    }
    {
      description: "this line has no *\ndoSomething(options)\ndoSomething(options)",
      descriptionRaw: "\n   this line has no *\n   * doSomething(options)\n     * doSomething(options)\n    ",
      line: 7,
      name: "default",
      optional: false,
      source: "@example default\nthis line has no *\ndoSomething(options)\ndoSomething(options)",
      tag: "example",
      type: "",
      whitespace: {
        start: '    ',
        tag: ' ',
        type: ' ',
        name: ' ',
        description: ' ',
        end: ''
      }
    }
  ]
}]

syavorsky · 2020-09-23T08:05:41Z

I think we are trying to introduce to many concepts: start, end, whitespace, descriptionMeta, descriptionRaw. It would make hard to read the output and make sense of it. And over time, more concepts would keep getting added.

I want to propose a simpler alternative, your .whitespace structure is kinda already half way there. We can represent each line as a list of typed tokens. All you do to get a raw string back is concatenating them in defined order.

{
    description: '\nOverall description\n\n another line\n',
    line: 1,
    source: '\nOverall description\n\n another line\n\n@param {Object} options\n@example default\nthis line has no marker\ndoSomething(options)\ndoSomething(options)\n',
    tokens: {
      descr: [{ // line 1
        start:       '     ',
        delim:       '*',
        postdelim:   ' ',
        descr:      'Overall description'
      }, { // line 2, empty
        start:       '     ',
        delim:       '*',
      }, { // line 3, shifted
        start:       '    ',
        delim:       '*',
        postdelim:   '    ',
        descr:       'Overall description'
      }, { // line 4, empty
        start:       '     ',
        delim:       '*',
      }]
    },
    tags: [{
      description: '',
      line: 6,
      name: 'options',
      optional: false,
      source: '@param {Object} options',
      tag: 'param',
      type: 'Object',
      tokens: {
        start:     '     ',
        delim:     '*',
        postdelim: ' ',
        tag:       '@param',
        posttag:   ' ',
        type:      '{Object}',
        posttype:  ' ',
        name:      'options'
      }
    }, {
      description: 'this line has no marker\ndoSomething(options)\ndoSomething(options)\n',
      line: 7,
      name: 'default',
      optional: false,
      source: '@example default\nthis line has no marker\ndoSomething(options)\ndoSomething(options)\n',
      tag: 'example',
      type: '',
      tokens: {
        start:     '     ',
        delim:     '*',
        postdelim: ' ',
        tag:       '@example',
        posttag:   ' ',
        name:      'default',
        descr: [{
          // nothing, line break
        }, {
          start:     '    ',
          descr:     'this line has no marker'
        }, {
          start:     '    ',
          delim:     '*',
          postdelim: ' ',   
          descr:     'doSomething(options)'
        }, {
          start:     '      ',
          delim:     '*',
          postdelim: ' ',   
          descr:     'doSomething(options)'
        }]
      }
    }]
  }

each tag, as well as general description, has .tokens object which you can render back with all formatting preserved. This function illustrates the idea

order = ['start', 'delim', 'postdelim', 'tag', 'posttag', 'type', 'posttype', 'name', 'postname', 'descr']
function stringify(tokens) {
  return order.reduce((block, token) => {
    return block + (Array.isArray(tokens[token]) ? tokens[token].map(stringify).join('\n') : (tokens[token] || ''))
  }, '')
}

task of validating/aligning callouts or tags using '.tokens' should be pretty straightforward. And this approach is agnostic to opts.trim and oneliners

brettz9 · 2020-09-23T09:15:02Z

Excellent--the only things I see we'd also want:

Line 0 having a description. Though in your example, it'd be the empty string, it could have content for the likes of /** @lends */. It couldn't be omitted either in the current config given that there are no line numbers to distinguish it from line 1.
In your original this line has no marker *, there was an asterisk at the end which should presumably be preserved if present.
There'd also be a need for postdescr (for main description and token descriptions) since comment-parser is currently stripping trailing whitespace.

syavorsky · 2020-09-23T15:17:59Z

Sure, some tweaks will be added as implemented. I will take it from here

syavorsky · 2020-09-23T16:15:06Z

@renatho would this unblock you on what you do for gajus/eslint-plugin-jsdoc? I am going to close this PR in favor to #93. Do you mind if I take the coding part?

renatho · 2020-09-24T16:54:23Z

Hey folks! I'm sorry for the delay to answer here, I'm having some busy days.

@renatho would this unblock you on what you do for gajus/eslint-plugin-jsdoc

Yep! This solution works to compare the alignment. \o/

I am going to close this PR in favor to #93. Do you mind if I take the coding part?

Sure! No problem! Thank you for that!

renatho added 2 commits September 20, 2020 13:51

Add parts positions

8c93574

Add description multiline information

2b7a76e

syavorsky mentioned this pull request Sep 23, 2020

Provide information about original source formatting #93

Closed

syavorsky closed this Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chunk positions #92

Add chunk positions #92

renatho commented Sep 20, 2020

brettz9 commented Sep 21, 2020

syavorsky commented Sep 21, 2020

brettz9 commented Sep 21, 2020 •

edited

Loading

syavorsky commented Sep 21, 2020

brettz9 commented Sep 21, 2020

syavorsky commented Sep 22, 2020

brettz9 commented Sep 22, 2020

syavorsky commented Sep 23, 2020

brettz9 commented Sep 23, 2020

syavorsky commented Sep 23, 2020

syavorsky commented Sep 23, 2020

renatho commented Sep 24, 2020

Add chunk positions #92

Add chunk positions #92

Conversation

renatho commented Sep 20, 2020

The PR

Example

Caveats

TODO

Future ideas

brettz9 commented Sep 21, 2020

syavorsky commented Sep 21, 2020

brettz9 commented Sep 21, 2020 • edited Loading

syavorsky commented Sep 21, 2020

brettz9 commented Sep 21, 2020

syavorsky commented Sep 22, 2020

brettz9 commented Sep 22, 2020

syavorsky commented Sep 23, 2020

brettz9 commented Sep 23, 2020

syavorsky commented Sep 23, 2020

syavorsky commented Sep 23, 2020

renatho commented Sep 24, 2020

brettz9 commented Sep 21, 2020 •

edited

Loading