Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incude line and column info from the original file #89

Closed
lazarljubenovic opened this issue Mar 30, 2018 · 3 comments · Fixed by #90
Closed

Incude line and column info from the original file #89

lazarljubenovic opened this issue Mar 30, 2018 · 3 comments · Fixed by #90

Comments

@lazarljubenovic
Copy link

Thanks for this great library!

It would be nice if parsed nodes contained information about the start and end of themselves.
Every node could have an additional property pos, something along the lines of the following.

interface Node {
  type: string,
  pos: {
    start: { line: number, col: number },
    end: { line: number, col: number },
  },
}
@andrejewski
Copy link
Owner

@lazarljubenovic Thanks for making this issue. Having column/line numbers would definitely play to Himalaya's strength in being an accurate-representation parser.

I don't think implementing this would be too hard, it's two more things to book-keep during parsing. My question is for the Element node, should we be granular enough to provide:

interface Element {
  ...,
  startTagPosition: {
    start: { line: number, col: number },
    end: { line: number, col: number },
  },
  endTagPosition: {
    start: { line: number, col: number },
    end: { line: number, col: number },
  }
}

Along with the Node.position.

This is <i>sample markup</i>.
^      ^^ ^^           ^^  ^^
a      ab bc           cd  de

I cannot say I'll get to this immediately, but it is worth adding.

@lazarljubenovic
Copy link
Author

Hmm. Interesting question. If we add that, then one might expect to do a similar thing for attributes' key-value pairs. In which case it's not obvious how to handle the tag itself: do we need a location for the tag name only? Are "s (which are optional) part of the attribute's value? What if there's no closing tag?

Curently Himalaya doesn't preserve any info about the way HTML was written (did you include a closing tag? did you wrap attribute value in quotes?), so it's a bit tricky to see how to do this as this is info about the original file, not the compiled JSON (and a first of such kind we're introducing here). There's no other convention established yet to compare it to.

Having the above in mind, this is what I suggest. (Lowercase letters are the "start" markers and their corresponding uppercase letter is the "end" marker. They are on separate lines to allow multiple markers on the same character.)

This is <i>sample markup</i>!
^      ^^  ^           ^   ^^
a      Ab  c           C   Bd
                            D

[A] - Text with content "This is"
[B] - Element with tag "i", no attributes and a single child [C]
[c] - Text with content "sample markup"
[D] - Text with content "!"
This <div>is <span>sample markup</div>
^   ^^    ^ ^^     ^           ^     ^
a   Ab    c Cd     e           D     B
                               E

[A] - Text with content "This "
[B] - Element with tag "div", no attributes and children [C] and [D]
[C] - Text with content "is "
[D] - Element with tag "span", no attributes and a single child [E]
[E] - Text with content "sample markup"
This <div class=foo tabindex="-1">is sample markup
^   ^^    ^       ^ ^           ^ ^              ^
a   Ab    d       D e           E c              B
                                                 C

[A] - Text with content "This "
[B] - Element with tag "div", attributes [D] and [E] and a single child [C]
[C] - Text with content "is sample markup"
[D] - Attribute with key "class" and value "foo"
[E] - Attribute with key "tabindex" and value "-1"

In my opinion, making it more sophisticated than this would be peeking into the lexer. Maybe it will be useful in the future to add all lexical nodes which participated in creating a parsed node, and see additional info there.

It's just my two cents, of course.

@lazarljubenovic
Copy link
Author

Hey, thanks a lot for adding this! 🎉 Sorry for the super-late reply 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants