-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/tools/diff: a package for computing text differences #58893
Comments
The idea of doing this seems fine, but perhaps the documentation should include a few caveats to cover the following points. (There's lots of ways to write diff, and sadly no best one. This one is for inputs that are fairly similar, or fairly short.)
|
See also #45200. |
Instead of |
I agree that we should guarantee only that the composition of diff.Strings and diff.Apply is the identity, and nothing about the specific edits that it returns. We should probably also define the Unified text form in more detail. Thanks for the links to related proposals. There's a fair bit of interest in both the narrow concept of text diff as proposed here, and in richer kinds of structured value diff for use in testing. If some form of the latter is accepted into the standard library, then perhaps simple text diff, on which it would depend, would also need to be in the standard library, though not necessarily exposed. I'm going to resist the temptation to argue that this should be a standard package. We can always do that later. |
I tried that initially, but it turns out to be incorrect: it's imperative that you use sort.Stable for edits since insertions at the same point must preserve their relative order. |
There is also a Alternatively, you could add a mechanism to |
The definition of Apply makes clear that the slice of edits is a list, not a set: the relative ordering of insertions is important. But Apply can call SortEdits internally. Within gopls, we use SortEdits after merging lists of edits to the same file, but simple concatenation should suffice. It's also used to ensure to ensure a deterministic order, which some clients have mistakenly assumed. Perhaps we should remove SortEdits from the API and let gopls implement its own copy of that function. |
Can the Strings and Bytes functions be unified behind |
They could, but it seems like a lot of trouble just to achieve name overloading. |
The other way around, I feel like it's a lot of work to have duplicate Strings and Bytes functions that work the same way instead of having callers cast their []byte to string or having a single generic function. |
The allocation penalty of the bytes conversions for large files when the diff is empty (the common case, which needn't allocate any Edits) is a pretty significant multiple of the cost of the actual diff algorithm: 8x in one quick benchmark of a 10KB file. |
There's no need to maintain two versions of the functionality in either case. It's very easy as proposed to just use the same implementation since it doesn't perform any modification in func Strings(before, after string) []Edit {
return Bytes(
unsafe.Slice(unsafe.StringData(before), len(before)),
unsafe.Slice(unsafe.StringData(after), len(after)),
)
} Or, so as to avoid the |
That's true, but the proof requires auditing a thousand lines of code across half a dozen files, so I hesitate. The generics approach is safer and more appealing, and it should nicely handle even the intentional differences between bytes and strings. For example, It may cause significant expansion of the object code though; I should measure. |
Chatting with @rsc yesterday he raised a number of interesting questions:
|
That sounds like an excellent application for a fuzz test! 🙃 |
This proposal has been added to the active column of the proposals project |
On one hand, I would like this to be a generic I think there's utility in providing a |
This comment was marked as duplicate.
This comment was marked as duplicate.
This package doesn't seem to be specific to the Go language like others packages in
So why not |
The text module is primarily concerned with internationalization and localization, and deals with Unicode normalization, whereas this implementation computes differences over rune sequences without regard to such matters. Also, diff is an important component of typical code transformation tools, which is very much the domain of the x/tools module. |
FWIW I think x/text/diff would be a fine home for a diff package - text doesn't have to be all about Unicode. For example text/template is not. But we probably still shouldn't commit to a diff API right now anyway. |
I think this package is too special purpose. It's designed for an editor, where sub-line diffs are useful, but there's no point in producing a large number of them. |
This issue is actively tracked through the proposal process, presumably it should be re-opened to allow that process to continue? |
@pjweinb doesn't seem to be a member of the Go team. How did he close the issue? Shouldn't that be something only the author of the issue or a moderator can do? |
ah. well, i am on the Go team, and I think Alan meant this to be our
proposal, so that's why I closed it. But we can reopen it if that would
allow for a cleaner process.
…On Tue, Apr 25, 2023 at 11:06 AM DeedleFake ***@***.***> wrote:
@pjweinb <https://github.com/pjweinb> doesn't seem to be a member of the
Go team. How did he close the issue? Shouldn't that be something only the
author of the issue or a moderator can do?
—
Reply to this email directly, view it on GitHub
<#58893 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJIAI2BDYL6ORPPZAEXRMTXC7R6LANCNFSM6AAAAAAVRHW3Q4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi, Peter is indeed a member of the Go team, and the main author of the package. The proposal was a joint one, and we are both happy to retract it in light of the various concerns raised. Sorry for the confusion. |
@pjweinb, I think there's a good avatar you could use for your Github account online somewhere, so people recognize you. |
Recent work in gopls resulted in the creation of an internal package for computing text differences in the manner of the UNIX
diff
command, for applying those differences to a file in the manner of thepatch
command, and for presenting line-oriented diffs using +/- prefix notation aka GNU "unified" diff format (diff -u). Diff functionality is invaluable for developer tools that transform source files, and for tests that compare expected and actual outputs. We propose to publish our diff package with the public API shown below.@pjweinb @findleyr
The text was updated successfully, but these errors were encountered: