Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

Merged

Conversation

herbelin
Copy link
Member

@herbelin herbelin commented May 17, 2024

Fix was actually very easy and will make life in xml-based UIs much more comfortable.

Fixes #18682, #19139.

  • Added changelog.

@herbelin herbelin added kind: fix This fixes a bug or incorrect documentation. part: xml protocol labels May 17, 2024
@herbelin herbelin added this to the 8.20+rc1 milestone May 17, 2024
@coqbot-app coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label May 17, 2024
@herbelin herbelin force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from 6acf253 to f5b4514 Compare May 17, 2024 10:07
herbelin added a commit to herbelin/github-coq that referenced this pull request May 17, 2024
@herbelin herbelin marked this pull request as ready for review May 17, 2024 14:34
@herbelin herbelin requested a review from a team as a code owner May 17, 2024 14:34
herbelin added a commit to herbelin/github-coq that referenced this pull request May 17, 2024
@herbelin herbelin force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from 07606ed to 04b2969 Compare May 17, 2024 14:34
@herbelin herbelin requested a review from a team as a code owner May 17, 2024 14:34
@@ -9,7 +9,7 @@
(************************************************************************)

(** Protocol version of this file. This is the date of the last modification. *)
let protocol_version = "20230413"
let protocol_version = "20240517"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular procedure to advertise protocol changes?

Copy link
Member

@jfehrle jfehrle May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may break coqTail and vsCoq (not vsCoq2). Ideally they could make changes now so those tools still work on master as well as in 8.20. Reach out to @whonore for coqTail. IIRC @maximedenes tweaked vsCoq.

@ppedrot
Copy link
Member

ppedrot commented May 17, 2024

That's like, the fourth time or so we try to fix this?

@herbelin
Copy link
Member Author

@jfehrle tried something the details of which I don't know well, then @ejgallego tried to rely on the byte offset provided by gtk but this required passing the line number across the protocol, what he did in 3a3de8f but there was still a place in serialization which was not passing a line number.

So, to me, the current solution (passing the line number and relying on gtk's byte-offset computation rather than recomputing a char-offset by counting ourselves how many utf8 bytes are used) seems to be pretty reasonable.

At least, it is not like we are not knowing where we are going. It is more that we knew but made it by steps.

@herbelin herbelin changed the title Fix passing line number of errors in xml protocol, causing ill-located or non-located errors in the presence of utf8 characters Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters May 17, 2024
@jfehrle
Copy link
Member

jfehrle commented May 17, 2024

Aside from the line number issue, can you give an example of the UTF-8 problem you mention fixing?

It's a odd to share a Loc.t because the server side uses character offsets (in several fields) while the client side (GTK) only works with byte offsets. Seems like this is likely to cause confusion.

IIRC there are at least 4 different cases to test for error messages and tooltips. In particular, displaying errors correctly when the user modifies a file that's being processed asynchronously. I pointed out problems (based on testing the code by hand) with multiple versions of @ejgallego's changes, including the final one. If you're interested in trying the test cases, I'll see if I find my notes on that. I had many comments in #17382.

@herbelin
Copy link
Member Author

Aside from the line number issue, can you give an example of the UTF-8 problem you mention fixing?

Actually no :(. I thought the PR was fixing the UTF-8 shift but it fixes only the line number issue (as in #18682). (My own test had the error on a line w/o UTF-8 so I did not realize it that it was fixing only the line and not the offset.)

@herbelin
Copy link
Member Author

herbelin commented May 19, 2024

Actually, there seems to be a lablgtk bug! A bad copy-paste at this line of the ocaml-C wrapper ml_gtktext.c (fixed in garrigue/lablgtk#181).

So, I guess we indeed have to rely in the meantime on our own byte-to-char conversion.

@herbelin
Copy link
Member Author

herbelin commented May 19, 2024

It's a odd to share a Loc.t because the server side uses character offsets (in several fields) while the client side (GTK) only works with byte offsets.

Are you meaning the opposite, i.e. bytes in Coq and chars in GTK?

Would you know how to insert a "b2c" in the current code? One that works from the beginning of the current line and not from the beginning of the whole sentence?

PS: For the record, a short example with shift-by-1 locating is e.g.:

Definition α := x.

@jfehrle
Copy link
Member

jfehrle commented May 20, 2024

Are you meaning the opposite, i.e. bytes in Coq and chars in GTK?

Indeed.

Would you know how to insert a "b2c" in the current code? One that works from the beginning of the current line and not from the beginning of the whole sentence?

Yes. The code can be short and efficient, but it takes considerable care to cover all the cases and shouldn't be rushed. @ejgallego submitted 2 PRs related to this some months apart. The first broke the mechanism completely and the second didn't fix all the damage. I'd probably start with the code from before both of those PRs.

The code before both PRs saved the byte offset of the beginning of the sentence. Then the byte offset -> char offset conversion only has to examine the sentence, not the whole buffer (which caused performance issues). An additional wrinkle is that in async mode (e.g. if a proof fails), the buffer offset of a sentence can change if the user can edits a failed proof, which will change the buffer offset of subsequent sentences. There is a way to adjust for this.

But let's fix all the issues this time. WDYT? Also, do you know when will 8.20 will freeze?

@herbelin
Copy link
Member Author

Sorry, I was expected an answer but did not realize that you wrote.

Also, do you know when will 8.20 will freeze?

There is thread here: https://coq.zulipchat.com/#narrow/stream/237656-Coq-devs-.26-plugin-devs/topic/8.2E20 . The current plan is:

Branch creation: 2024-06-17
8.20+rc1: 2024-06-26
8.20.0: 2024-09-04

@ejgallego submitted 2 PRs related to this some months apart. The first broke the mechanism completely and the second didn't fix all the damage

I don't know the details of all the work done by everyone, but regarding the passing of the full loc, line included, as well as regarding the use of the native GTK support for byte offset (get_iter_at_line_offset), these seem a priori to be good ideas (better locality and less glue). The problem was that Emilio did not test his changes, not realizing that the XML protocol had still holes and that lablgtk had a bug (now fixed upstream, but I don't know when it will be part of a released version of lablgtk; the last release, 3.1.4, was in January and there are 3 commits since then).

I just tested the PR with the development version of lablgtk and the example Definition α := x. has now the good underlining! So, let's continue testing and if it works for all expected examples, maybe the simplest is to ensure that coqide is compiled with the development version of lablgtk.

Otherwise, I think we can go with the byte-to-char translation from the beginning of the sentence (or maybe from the beginning of the line).

@ejgallego
Copy link
Member

@herbelin , indeed testing this stuff is hard and I didn't realize of the real problem until later (which also impacted coq-lsp, which I have to say has no issues with locations)

In particular what I didn't realize is that the lexer had been totally broken w.r.t. to positions by the debugger PR (even if I requested we keep a bug, we shouldn't have merged the code in that state). See #16978 for a summary of the problems. So the hole was not in the protocol but in the lexer.

@herbelin this change is good, and I tried in some other PR; I did close it tho as I didn't have the cycles to adapt 3rd party users of the protocol, as the change doesn't seem backwards compatible.

I didn't notice the labgtk bug, that's cool.

I am at loss at how to react to @jfehrle comments, other than to say that they are just basically technically and historically inaccurate; as we explained in the call. The curious reader can dig in some of the relevant PRs as all the info is there, including a summary.

Contrary to what Jim believes, the code between the merge of the debugger PR and #16978 is just too broken to even make coq-lsp with loads of hacks (we ported to 8.11 so the code prior was way less buggy)

So I strongly recommend no to go back!

@herbelin
Copy link
Member Author

herbelin commented May 26, 2024

I tested @ppedrot's fail/wait 20 example and it seems to have the underlining of fail stable. #17023 and #16987 seem still fixed. I don't know well what else to exactly test.

I propose that we already merge this PR and that any regular coqide user install the development version of lablgtk, so that we can daily test further. To install lablgtk, I did:

git clone https://github.com/garrigue/lablgtk.git
git checkout lablgtk3 # if not already the case
dune build
dune install # installed for me in the current opam switch, which itself in the path

Note that I needed the following patch for src-goocanvas/dune before calling dune:

diff --git a/src-goocanvas2/dune b/src-goocanvas2/dune
index 34145a18..5da5d91f 100644
--- a/src-goocanvas2/dune
+++ b/src-goocanvas2/dune
@@ -12,9 +12,9 @@
 
 (rule
  (targets
-   cflag-goocanvas-2.0.sexp
-   clink-goocanvas-2.0.sexp)
- (action (run dune_config -pkg goocanvas-2.0 -version 2.0.4)))
+   cflag-goocanvas-3.0.sexp
+   clink-goocanvas-3.0.sexp)
+ (action (run dune_config -pkg goocanvas-3.0 -version 3.0.0)))
 
 (rule
  (targets cflag-extraflags.sexp)
@@ -27,7 +27,7 @@
  (flags :standard -w -6-7-27-32-33-34-36)
  (modules_without_implementation gooCanvas_types)
  (c_names ml_gtkgoocanvas)
- (c_flags         (:include cflag-goocanvas-2.0.sexp) (:include cflag-extraflags.sexp) -Wno-deprecated-declarations)
- (c_library_flags (:include clink-goocanvas-2.0.sexp))
+ (c_flags         (:include cflag-goocanvas-3.0.sexp) (:include cflag-extraflags.sexp) -Wno-deprecated-declarations)
+ (c_library_flags (:include clink-goocanvas-3.0.sexp))
  (libraries lablgtk3))
 

Copy link
Member

@ejgallego ejgallego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks a lot @herbelin

I tried to do this in #17391 , but I think I didn't have the time to update the clients, sorry I didn't manage to point to you to this direction. The fix of lablgtk is impressive.

By the way, you can also vendor lablgtk (just do a symlink in Coq's tree), which is convenient for quick testing.

@herbelin , you may find some bits in #17391 interesting, in case you'd like to add them here, also Jim's tooltip test case (which is pretty good) is there I believe.

@ejgallego
Copy link
Member

@herbelin the standard method for positions these days is indeed line / column , encoded in very different ways tho, due to windows and unicode (actually I misinterpreted the LSP spec in this sense, thanks to Léo Stefanesco who corrected this mistake, it took quite a bit of effort to get unicode working well but now we have a superb solution IMO)

However, offsets (in this case in byte form), even if less standard, are very very useful, so I'm so glad for Coq's Loc.t type, which has both.

Many advanced use cases and editors, such as CodeMirror , do require offsets too!

But yes, but GTK editing, the right choice is to use line numbers + offset adjustement, and store metadata in protocol-level format (which can vary between platforms)

@jfehrle
Copy link
Member

jfehrle commented May 27, 2024

@herbelin I propose that we already merge this PR

It would be good to fix everything at once so we don't (perhaps) change the protocol twice. But there's also the argument that users would appreciate an incremental improvement for 8.20. I can try more test cases later (it's getting a little late).

You don't plan to do anything about the Definition α := x. case in this PR?

@ejgallego the standard method for positions these days is indeed line / column

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

@jfehrle
Copy link
Member

jfehrle commented May 27, 2024

Using PMP's example, the line number shown in the error panel is incorrect. Maybe an easy fix:

image

@herbelin
Copy link
Member Author

Using PMP's example, the line number shown in the error panel is incorrect.

I confirm (and I have no idea a priori of how to fix it).

the standard method for positions these days is indeed line / column

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

Hum, what to do? Could the computation once made from the beginning of the sentence be used to compute the line shift from the beginning of the sentence?

You don't plan to do anything about the Definition α := x. case in this PR?

This works with the fixed version of lablgtk. So the question is about whether we want to have a solution (and have the courage to implement it) even without the lablgtk fix. In any case, this does not have to be in this PR.

What the PR fixes is practically important (as in #18682 but imagine when there are more lines involved). The wrong location (and sometimes even no location at all, because computed out of the text) made the use of coqide really more difficult to me. (If you don't know what to expect, maybe it is not so bad, but when you know that there should be a location and you have to look through e.g. 10 lines to understand from were some typing error could come, it is frustrating.). So, for me, the priority is to merge the PR and appreciate the incremental improvement for 8.20.

The unicode shift due to the lablgkt bug is not elegant but somehow less critical in practice because you can still reconstruct where the underlining was intended.

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

@ejgallego:

By the way, you can also vendor lablgtk (just do a symlink in Coq's tree), which is convenient for quick testing.

Sorry, I don't understand. A symlink to what?

In any case, I'm glad that we are now converging to a resolution of most location problems!

It would be good to fix everything at once so we don't (perhaps) change the protocol twice

Maybe the other known users of the XML protocol have an opinion here (CoqTail, VsCode 1)?

@ejgallego
Copy link
Member

I confirm (and I have no idea a priori of how to fix it).

What the rest of the code does is to adjust the locations, I guess this list is not updated? I can have a look and find the concrete code that does that for tooltips.

Sorry, I don't understand. A symlink to what?

To lablgtk sources.

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

Why is the choice of offsets vs line / column relevant here? In both cases, the locations need adjustment, I don't see how line / col cannot be also adjusted (in fact it could be even more efficient, that's why text editors prefer line/col coordinates)

whonore added a commit to whonore/Coqtail that referenced this pull request Jun 2, 2024
coq/coq#19040 changes the way error locations are
reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e`
attributes of the `fail` result with an optional `Loc.t` field. If I understand
correctly, the `start` and `stop` fields contain the same byte offsets as the
old `loc_s` and `loc_e`, so it's enough to just extract and return those.
whonore added a commit to whonore/Coqtail that referenced this pull request Jun 2, 2024
coq/coq#19040 changes the way error locations are
reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e`
attributes of the `fail` result with an optional `Loc.t` field. If I understand
correctly, the `start` and `stop` fields contain the same byte offsets as the
old `loc_s` and `loc_e`, so it's enough to just extract and return those.
@jfehrle
Copy link
Member

jfehrle commented Jun 2, 2024

Right, so maybe the protocol-breaking initial PR can be merged to only 8.20 but the utf8 workaround already to 8.19.2?

Yes, though we should check that it would improve 8.19.

@@ -174,6 +174,19 @@ let ulen uni_ch =
else if uni_ch < 0x10000 then 3
else 4

(* workaround for lablgtk bug up to version 2.18.13 *)
(* replaces: buffer#get_iter_at_byte ~line index *)
(* see https://github.com/garrigue/lablgtk/pull/175 *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem related, garrigue/lablgtk#175 is about copy_string_check AFAICT

Copy link
Member Author

@herbelin herbelin Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 181 (and already fixed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@jfehrle jfehrle force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from eca53b9 to be70563 Compare June 2, 2024 17:47
@herbelin
Copy link
Member Author

herbelin commented Jun 2, 2024

Yes, though we should check that it would improve 8.19.

It improves at least the locating of the error in Definition α := x..

Copy link
Member

@jfehrle jfehrle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge soon if there are no further comments.

I'll submit the third commit to 8.19.2 separately.

@jfehrle jfehrle self-assigned this Jun 2, 2024
@jfehrle
Copy link
Member

jfehrle commented Jun 2, 2024

@coqbot run full ci

@coqbot-app coqbot-app bot removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 2, 2024
@jfehrle jfehrle modified the milestones: 8.19.2, 8.20+rc1 Jun 2, 2024
@jfehrle
Copy link
Member

jfehrle commented Jun 2, 2024

@maximedenes @rtetley Do you know who, if anyone, maintains vsCoq1? This PR will likely break it since it changes the XML protocol a little. We'd like to give a heads up and get a clean fix.

@gares
Copy link
Member

gares commented Jun 3, 2024

@thery for vscoq1 maintenance

@jfehrle
Copy link
Member

jfehrle commented Jun 3, 2024

@thery This PR makes a small modification to the XML protocol that vsCoq1 relies on. Will you or someone else be able to update vsCoq1 for use with Coq 8.20? Thanks.

@jfehrle jfehrle force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from be70563 to 3db6599 Compare June 4, 2024 05:28
@coqbot-app coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 4, 2024
@jfehrle
Copy link
Member

jfehrle commented Jun 4, 2024

@herbelin See #19139, for which the underlying problem is that a fail message only returns bp and ep. See the code in Xmlprotocol.to_value with loc_s and loc_e. WDYT? [Already addressed]

@jfehrle
Copy link
Member

jfehrle commented Jun 5, 2024

@coqbot: merge now

Copy link
Contributor

coqbot-app bot commented Jun 5, 2024

@jfehrle: You cannot merge this PR because:

  • There is still a needs: full CI label.

@jfehrle jfehrle removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 5, 2024
@jfehrle
Copy link
Member

jfehrle commented Jun 5, 2024

@coqbot: merge now

@coqbot-app coqbot-app bot merged commit 983721d into coq:master Jun 5, 2024
6 checks passed
whonore added a commit to whonore/Coqtail that referenced this pull request Jun 15, 2024
coq/coq#19040 changes the way error locations are
reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e`
attributes of the `fail` result with an optional `Loc.t` field. If I understand
correctly, the `start` and `stop` fields contain the same byte offsets as the
old `loc_s` and `loc_e`, so it's enough to just extract and return those.
@FissoreD FissoreD mentioned this pull request Jun 27, 2024
thery added a commit to thery/vscoq that referenced this pull request Jun 28, 2024
thery added a commit to coq/vscoq that referenced this pull request Jun 28, 2024
Rixxc pushed a commit to Rixxc/Coqtail that referenced this pull request Jul 2, 2024
coq/coq#19040 changes the way error locations are
reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e`
attributes of the `fail` result with an optional `Loc.t` field. If I understand
correctly, the `start` and `stop` fields contain the same byte offsets as the
old `loc_s` and `loc_e`, so it's enough to just extract and return those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: fix This fixes a bug or incorrect documentation. part: xml protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Syntax errors reported at a wrong location in CoqIDE
7 participants