Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make comparison of newlines in text files more precise #2600

Merged
merged 1 commit into from
Jul 9, 2023

Conversation

HannesWell
Copy link
Member

At the moment the TextComparator ignores simplify all kind of newline characters (\r and \n), which has the effect that, although a new line to a text file was added or removed, the comparator considers that file as unchanged.

This PR enhances the TextComparator to only ignore a difference in newline characters if there is is an exactly matching pair of \r or \r\n in baseline respectively reactor.

This PR also adds tests for the TextComparator and moves the showDiffDetails property to ComparisonData (in the first commit) to simplify testing.

@HannesWell HannesWell requested a review from laeubi June 29, 2023 07:16
@HannesWell HannesWell force-pushed the enhancedNewlineComparison branch 2 times, most recently from bee5fc3 to bae1cf1 Compare June 29, 2023 07:26
@laeubi
Copy link
Member

laeubi commented Jun 29, 2023

At the moment the TextComparator ignores simplify all kind of newline characters (\r and \n), which has the effect that, although a new line to a text file was added or removed, the comparator considers that file as unchanged.

If newlines do not matter for the delta should then additional ones also don't matter? Can you explain in what case an additional newline is relevant? In the end the user can always bump the version anyways if it is considered a relevant change.

boolean hasNext() {
skipNewLines();
return index < bytes.length;
// Both sliders are at either "\n" or "\r\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that a single \r is a valid newline as well!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right (although AFAICT not widely used), but the important part is that a single \r is treated equally on Windows and Linux, especially by git.

Please have a look at the test-cases I created. If you think there is a case that is not covered, please let me know and the tests can be easily extended.

@HannesWell
Copy link
Member Author

At the moment the TextComparator ignores simplify all kind of newline characters (\r and \n), which has the effect that, although a new line to a text file was added or removed, the comparator considers that file as unchanged.

If newlines do not matter for the delta should then additional ones also don't matter? Can you explain in what case an additional newline is relevant? In the end the user can always bump the version anyways if it is considered a relevant change.

They do matter. At least from my point of view, the new line skip was introduced for text files to deal with the different default line-ending on Windows and Linux and because it is very common to use git's auto-crlf feature where on Windows on checks in Linux-style and checks-out windows style line-endings.
So I think the goal should be to check if there is a content difference or if only the line-endings style changed.
And yes I would consider just a blank new line a content change, even if one for example just appends a new-line at the end of the file.

And considering added/removed newlines is for example relevant if you adjust a doc or license file. Of course one can bump the version, but if that is not necessary I believe nobody thinks about that in advance and will just wonder why the changed doc/license file is not in the build jars, while they are present in the IDE.

@github-actions
Copy link

github-actions bot commented Jun 29, 2023

Test Results

   558 files  ±0     558 suites  ±0   3h 51m 59s ⏱️ - 13m 49s
   357 tests ±0     351 ✔️ ±0    6 💤 ±0  0 ±0 
1 071 runs  ±0  1 052 ✔️ ±0  19 💤 ±0  0 ±0 

Results for commit ccd6ce3. ± Comparison against base commit 865f130.

♻️ This comment has been updated with latest results.

@laeubi
Copy link
Member

laeubi commented Jul 3, 2023

I believe nobody thinks about that in advance and will just wonder why the changed doc/license file is not in the build jars, while they are present in the IDE.

Maybe because its a strange license that is considered different because of more line endings ;-)

Anyways it seems the testcases do not pass (anymore) with that change on some platforms.

@@ -19,22 +19,22 @@

public interface ArtifactComparator {

public static class ComparisonData {
public static record ComparisonData(List<String> ignoredPattern, boolean writeDelta, boolean showDiffDetails) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this change can be a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Here we go: #2610

// Possible new lines:
// \n -- unix style
// \r\n -- windows style
// \r -- whatever style (ignored?!)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Wikipedia's Mac OS 9 article, the latest release of Mac OS 9 is 9.2.2 / December 5, 2001, which was 21 years ago. Therefore I think this is not really relevant anymore.

Nevertheless my intention was that this works as if one would use a BufferedReader to read the lines one by one using BufferedReader.readLine() and compare the line content, just more efficient (i.e. without creating Strings all the time).
And the javadoc of BufferedReader.readLine() says:

Reads a line of text. A line is considered to be terminated by any oneof a line feed ('\n'), a carriage return ('\r'),
a carriage return followed immediately by a line feed, or by reaching the end-of-file(EOF).

So if equivalence is the goal, \r needs to be considered.

@laeubi
Copy link
Member

laeubi commented Jul 6, 2023

@HannesWell do you plan to work on this and possibly backport to Tycho 4? I currently plan to prepare a bugfix release so so we might want to include it there.

@HannesWell HannesWell force-pushed the enhancedNewlineComparison branch 2 times, most recently from c73bc38 to d10aed2 Compare July 6, 2023 21:14
@HannesWell
Copy link
Member Author

@HannesWell do you plan to work on this and possibly backport to Tycho 4? I currently plan to prepare a bugfix release so so we might want to include it there.

I adjusted the logic so that \r now should be treated as regular line-ending and in the latest run all tests passed unchanged.

#2610 should be merged before this

Don't ignore all kind of newlines when comparing text-files, only treat
exactly matching \n and \r\n as equals.
@HannesWell HannesWell requested a review from laeubi July 8, 2023 08:16
@laeubi laeubi merged commit 199647e into eclipse-tycho:master Jul 9, 2023
@laeubi laeubi added the backport-to-tycho-4.0.x Can be added to a PR to trigger an automatic backport of the change label Jul 9, 2023
@github-actions
Copy link

github-actions bot commented Jul 9, 2023

💚 All backports created successfully

Status Branch Result
tycho-4.0.x

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

@HannesWell HannesWell deleted the enhancedNewlineComparison branch July 9, 2023 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-tycho-4.0.x Can be added to a PR to trigger an automatic backport of the change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants