-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diffLines seems broken #68
Comments
What browser are you seeing this on? |
Chrome 44.0.2403.130 (64-bit) on a MacBook Pro the diffLines seemed weird, but |
Thanks. This is not browser specific but an issue with the newline being included in the tokenizer's output, so the rest of the algorithm considers them to be distinct.
Under the trimmed case, we even go as far as explicitly adding these back in after the trim: Line 23 in 470ed65
The reason that all of this is done is that the string can not be reconstructed without these newlines. All of the operations that combine tokens just do It is possible to do a custom comparison the ignores the line for the sake of comparison checks but this too could run into issues with the newline values sometimes being included and sometimes being not, based on the diff path that is selected for two given inputs. I suspect that this is basically the reason that |
I figured it was the line tokenizer. I see this as a real bug though, since it's arbitrarily adding back newlines, which means the Do you need to reconstruct the string from the tokens at any point? Why not use character offsets attached to your tokens, the way esprima (for example) does in its Even if you don't want to go that far, I still think it's worth it to add the newlines back after the comparison, or trim again when comparing -- right now |
Regarding the arbitrary newline insertion, I merged the patch and line diff implementations last night since I ended up noticing they were basically the same while investigating this issue. This change should resolves that particular issue: 1597705 Regarding ranges, that's a ton of work that will break all custom diff implementations. That's not going to be feasible anytime in the near term. It is possible to tokenize or compare ignoring the newline characters but this ties into the issue above of newlines sometimes being included and sometimes not (ranges suffer from the same problem) Regarding correctness, the code is operating correctly under the definition that a line is inclusive of it's newline character. This is needed to properly create patches, etc. Unfortunately this directly conflicts with making human readable diffs, like I presume is what you are going for. Using a
This results in Basically I think a technical solution has been found... but implementing it without breaking other use cases might be a concern. I need to think about the best way to expose this as an API but will try to include something in the next release that exposes this behavior, one way or the other. |
sounds good on the arbitrary reinsertion, and the truth is I'm not particularly concerned with microsoft crlf's (but did notice it). I agree the definition that a line is inclusive of its newline character makes sense for your solution might work though - thanks for adding this to the roadmap! |
Trimmed I think we can reasonably go the route I was proposing and this is not a major change. For the non-trimmed implementation, it's more of that I need to figure out what to call it as well as finalize the other features in this release. You can also do a non-trimmed implementation in the interim using something like the following:
|
Functionality implemented via |
Released in 2.1.0 |
The text was updated successfully, but these errors were encountered: