Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

herbelin · 2024-05-17T10:06:25Z

Fix was actually very easy and will make life in xml-based UIs much more comfortable.

Fixes #18682, #19139.

Added changelog.

herbelin · 2024-05-17T14:35:27Z

ide/coqide/protocol/xmlprotocol.ml

@@ -9,7 +9,7 @@
 (************************************************************************)

 (** Protocol version of this file. This is the date of the last modification. *)
-let protocol_version = "20230413"
+let protocol_version = "20240517"


Is there a particular procedure to advertise protocol changes?

This may break coqTail and vsCoq (not vsCoq2). Ideally they could make changes now so those tools still work on master as well as in 8.20. Reach out to @whonore for coqTail. IIRC @maximedenes tweaked vsCoq.

ppedrot · 2024-05-17T15:07:22Z

That's like, the fourth time or so we try to fix this?

herbelin · 2024-05-17T16:20:03Z

@jfehrle tried something the details of which I don't know well, then @ejgallego tried to rely on the byte offset provided by gtk but this required passing the line number across the protocol, what he did in 3a3de8f but there was still a place in serialization which was not passing a line number.

So, to me, the current solution (passing the line number and relying on gtk's byte-offset computation rather than recomputing a char-offset by counting ourselves how many utf8 bytes are used) seems to be pretty reasonable.

At least, it is not like we are not knowing where we are going. It is more that we knew but made it by steps.

jfehrle · 2024-05-17T19:04:23Z

Aside from the line number issue, can you give an example of the UTF-8 problem you mention fixing?

It's a odd to share a Loc.t because the server side uses character offsets (in several fields) while the client side (GTK) only works with byte offsets. Seems like this is likely to cause confusion.

IIRC there are at least 4 different cases to test for error messages and tooltips. In particular, displaying errors correctly when the user modifies a file that's being processed asynchronously. I pointed out problems (based on testing the code by hand) with multiple versions of @ejgallego's changes, including the final one. If you're interested in trying the test cases, I'll see if I find my notes on that. I had many comments in #17382.

herbelin · 2024-05-19T09:50:35Z

Aside from the line number issue, can you give an example of the UTF-8 problem you mention fixing?

Actually no :(. I thought the PR was fixing the UTF-8 shift but it fixes only the line number issue (as in #18682). (My own test had the error on a line w/o UTF-8 so I did not realize it that it was fixing only the line and not the offset.)

herbelin · 2024-05-19T09:54:52Z

Actually, there seems to be a lablgtk bug! A bad copy-paste at this line of the ocaml-C wrapper ml_gtktext.c (fixed in garrigue/lablgtk#181).

So, I guess we indeed have to rely in the meantime on our own byte-to-char conversion.

herbelin · 2024-05-19T16:19:50Z

It's a odd to share a Loc.t because the server side uses character offsets (in several fields) while the client side (GTK) only works with byte offsets.

Are you meaning the opposite, i.e. bytes in Coq and chars in GTK?

Would you know how to insert a "b2c" in the current code? One that works from the beginning of the current line and not from the beginning of the whole sentence?

PS: For the record, a short example with shift-by-1 locating is e.g.:

Definition α := x.

jfehrle · 2024-05-20T21:38:30Z

Are you meaning the opposite, i.e. bytes in Coq and chars in GTK?

Indeed.

Would you know how to insert a "b2c" in the current code? One that works from the beginning of the current line and not from the beginning of the whole sentence?

Yes. The code can be short and efficient, but it takes considerable care to cover all the cases and shouldn't be rushed. @ejgallego submitted 2 PRs related to this some months apart. The first broke the mechanism completely and the second didn't fix all the damage. I'd probably start with the code from before both of those PRs.

The code before both PRs saved the byte offset of the beginning of the sentence. Then the byte offset -> char offset conversion only has to examine the sentence, not the whole buffer (which caused performance issues). An additional wrinkle is that in async mode (e.g. if a proof fails), the buffer offset of a sentence can change if the user can edits a failed proof, which will change the buffer offset of subsequent sentences. There is a way to adjust for this.

But let's fix all the issues this time. WDYT? Also, do you know when will 8.20 will freeze?

herbelin · 2024-05-26T18:27:55Z

Sorry, I was expected an answer but did not realize that you wrote.

Also, do you know when will 8.20 will freeze?

There is thread here: https://coq.zulipchat.com/#narrow/stream/237656-Coq-devs-.26-plugin-devs/topic/8.2E20 . The current plan is:

Branch creation: 2024-06-17
8.20+rc1: 2024-06-26
8.20.0: 2024-09-04

@ejgallego submitted 2 PRs related to this some months apart. The first broke the mechanism completely and the second didn't fix all the damage

I don't know the details of all the work done by everyone, but regarding the passing of the full loc, line included, as well as regarding the use of the native GTK support for byte offset (get_iter_at_line_offset), these seem a priori to be good ideas (better locality and less glue). The problem was that Emilio did not test his changes, not realizing that the XML protocol had still holes and that lablgtk had a bug (now fixed upstream, but I don't know when it will be part of a released version of lablgtk; the last release, 3.1.4, was in January and there are 3 commits since then).

I just tested the PR with the development version of lablgtk and the example Definition α := x. has now the good underlining! So, let's continue testing and if it works for all expected examples, maybe the simplest is to ensure that coqide is compiled with the development version of lablgtk.

Otherwise, I think we can go with the byte-to-char translation from the beginning of the sentence (or maybe from the beginning of the line).

ejgallego · 2024-05-26T18:45:27Z

@herbelin , indeed testing this stuff is hard and I didn't realize of the real problem until later (which also impacted coq-lsp, which I have to say has no issues with locations)

In particular what I didn't realize is that the lexer had been totally broken w.r.t. to positions by the debugger PR (even if I requested we keep a bug, we shouldn't have merged the code in that state). See #16978 for a summary of the problems. So the hole was not in the protocol but in the lexer.

@herbelin this change is good, and I tried in some other PR; I did close it tho as I didn't have the cycles to adapt 3rd party users of the protocol, as the change doesn't seem backwards compatible.

I didn't notice the labgtk bug, that's cool.

I am at loss at how to react to @jfehrle comments, other than to say that they are just basically technically and historically inaccurate; as we explained in the call. The curious reader can dig in some of the relevant PRs as all the info is there, including a summary.

Contrary to what Jim believes, the code between the merge of the debugger PR and #16978 is just too broken to even make coq-lsp with loads of hacks (we ported to 8.11 so the code prior was way less buggy)

So I strongly recommend no to go back!

herbelin · 2024-05-26T18:51:26Z

I tested @ppedrot's fail/wait 20 example and it seems to have the underlining of fail stable. #17023 and #16987 seem still fixed. I don't know well what else to exactly test.

I propose that we already merge this PR and that any regular coqide user install the development version of lablgtk, so that we can daily test further. To install lablgtk, I did:

git clone https://github.com/garrigue/lablgtk.git
git checkout lablgtk3 # if not already the case
dune build
dune install # installed for me in the current opam switch, which itself in the path

Note that I needed the following patch for src-goocanvas/dune before calling dune:

diff --git a/src-goocanvas2/dune b/src-goocanvas2/dune
index 34145a18..5da5d91f 100644
--- a/src-goocanvas2/dune
+++ b/src-goocanvas2/dune
@@ -12,9 +12,9 @@
 
 (rule
  (targets
-   cflag-goocanvas-2.0.sexp
-   clink-goocanvas-2.0.sexp)
- (action (run dune_config -pkg goocanvas-2.0 -version 2.0.4)))
+   cflag-goocanvas-3.0.sexp
+   clink-goocanvas-3.0.sexp)
+ (action (run dune_config -pkg goocanvas-3.0 -version 3.0.0)))
 
 (rule
  (targets cflag-extraflags.sexp)
@@ -27,7 +27,7 @@
  (flags :standard -w -6-7-27-32-33-34-36)
  (modules_without_implementation gooCanvas_types)
  (c_names ml_gtkgoocanvas)
- (c_flags         (:include cflag-goocanvas-2.0.sexp) (:include cflag-extraflags.sexp) -Wno-deprecated-declarations)
- (c_library_flags (:include clink-goocanvas-2.0.sexp))
+ (c_flags         (:include cflag-goocanvas-3.0.sexp) (:include cflag-extraflags.sexp) -Wno-deprecated-declarations)
+ (c_library_flags (:include clink-goocanvas-3.0.sexp))
  (libraries lablgtk3))

ejgallego

LGTM ! Thanks a lot @herbelin

I tried to do this in #17391 , but I think I didn't have the time to update the clients, sorry I didn't manage to point to you to this direction. The fix of lablgtk is impressive.

By the way, you can also vendor lablgtk (just do a symlink in Coq's tree), which is convenient for quick testing.

@herbelin , you may find some bits in #17391 interesting, in case you'd like to add them here, also Jim's tooltip test case (which is pretty good) is there I believe.

ejgallego · 2024-05-26T19:17:43Z

@herbelin the standard method for positions these days is indeed line / column , encoded in very different ways tho, due to windows and unicode (actually I misinterpreted the LSP spec in this sense, thanks to Léo Stefanesco who corrected this mistake, it took quite a bit of effort to get unicode working well but now we have a superb solution IMO)

However, offsets (in this case in byte form), even if less standard, are very very useful, so I'm so glad for Coq's Loc.t type, which has both.

Many advanced use cases and editors, such as CodeMirror , do require offsets too!

But yes, but GTK editing, the right choice is to use line numbers + offset adjustement, and store metadata in protocol-level format (which can vary between platforms)

jfehrle · 2024-05-27T04:37:35Z

@herbelin I propose that we already merge this PR

It would be good to fix everything at once so we don't (perhaps) change the protocol twice. But there's also the argument that users would appreciate an incremental improvement for 8.20. I can try more test cases later (it's getting a little late).

You don't plan to do anything about the Definition α := x. case in this PR?

@ejgallego the standard method for positions these days is indeed line / column

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

jfehrle · 2024-05-27T04:40:18Z

Using PMP's example, the line number shown in the error panel is incorrect. Maybe an easy fix:

herbelin · 2024-05-27T08:19:25Z

Using PMP's example, the line number shown in the error panel is incorrect.

I confirm (and I have no idea a priori of how to fix it).

the standard method for positions these days is indeed line / column

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

Hum, what to do? Could the computation once made from the beginning of the sentence be used to compute the line shift from the beginning of the sentence?

You don't plan to do anything about the Definition α := x. case in this PR?

This works with the fixed version of lablgtk. So the question is about whether we want to have a solution (and have the courage to implement it) even without the lablgtk fix. In any case, this does not have to be in this PR.

What the PR fixes is practically important (as in #18682 but imagine when there are more lines involved). The wrong location (and sometimes even no location at all, because computed out of the text) made the use of coqide really more difficult to me. (If you don't know what to expect, maybe it is not so bad, but when you know that there should be a location and you have to look through e.g. 10 lines to understand from were some typing error could come, it is frustrating.). So, for me, the priority is to merge the PR and appreciate the incremental improvement for 8.20.

The unicode shift due to the lablgkt bug is not elegant but somehow less critical in practice because you can still reconstruct where the underlining was intended.

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

@ejgallego:

By the way, you can also vendor lablgtk (just do a symlink in Coq's tree), which is convenient for quick testing.

Sorry, I don't understand. A symlink to what?

In any case, I'm glad that we are now converging to a resolution of most location problems!

It would be good to fix everything at once so we don't (perhaps) change the protocol twice

Maybe the other known users of the XML protocol have an opinion here (CoqTail, VsCode 1)?

ejgallego · 2024-05-27T08:30:51Z

I confirm (and I have no idea a priori of how to fix it).

What the rest of the code does is to adjust the locations, I guess this list is not updated? I can have a look and find the concrete code that does that for tooltips.

Sorry, I don't understand. A symlink to what?

To lablgtk sources.

That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do.

Why is the choice of offsets vs line / column relevant here? In both cases, the locations need adjustment, I don't see how line / col cannot be also adjusted (in fact it could be even more efficient, that's why text editors prefer line/col coordinates)

coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.

jfehrle · 2024-06-02T16:07:46Z

Right, so maybe the protocol-breaking initial PR can be merged to only 8.20 but the utf8 workaround already to 8.19.2?

Yes, though we should check that it would improve 8.19.

SkySkimmer · 2024-06-02T16:11:23Z

ide/coqide/ideutils.ml

@@ -174,6 +174,19 @@ let ulen uni_ch =
  else if uni_ch < 0x10000 then 3
  else 4

+(* workaround for lablgtk bug up to version 2.18.13 *)
+(* replaces: buffer#get_iter_at_byte ~line index *)
+(* see https://github.com/garrigue/lablgtk/pull/175 *)


doesn't seem related, garrigue/lablgtk#175 is about copy_string_check AFAICT

It is 181 (and already fixed).

herbelin · 2024-06-02T17:49:48Z

Yes, though we should check that it would improve 8.19.

It improves at least the locating of the error in Definition α := x..

jfehrle

Will merge soon if there are no further comments.

I'll submit the third commit to 8.19.2 separately.

jfehrle · 2024-06-02T18:20:48Z

@coqbot run full ci

jfehrle · 2024-06-02T18:25:26Z

@maximedenes @rtetley Do you know who, if anyone, maintains vsCoq1? This PR will likely break it since it changes the XML protocol a little. We'd like to give a heads up and get a clean fix.

gares · 2024-06-03T09:12:54Z

@thery for vscoq1 maintenance

jfehrle · 2024-06-03T18:36:14Z

@thery This PR makes a small modification to the XML protocol that vsCoq1 relies on. Will you or someone else be able to update vsCoq1 for use with Coq 8.20? Thanks.

jfehrle · 2024-06-04T07:52:18Z

@herbelin ~~See #19139, for which the underlying problem is that a fail message only returns bp and ep. See the code in Xmlprotocol.to_value with loc_s and loc_e. WDYT?~~ [Already addressed]

jfehrle · 2024-06-05T18:59:48Z

@coqbot: merge now

coqbot-app · 2024-06-05T18:59:49Z

@jfehrle: You cannot merge this PR because:

There is still a needs: full CI label.

jfehrle · 2024-06-05T19:00:22Z

@coqbot: merge now

coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.

fix coq/coq#19040

coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.

herbelin added kind: fix This fixes a bug or incorrect documentation. part: xml protocol labels May 17, 2024

herbelin added this to the 8.20+rc1 milestone May 17, 2024

coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label May 17, 2024

herbelin force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from 6acf253 to f5b4514 Compare May 17, 2024 10:07

herbelin added a commit to herbelin/github-coq that referenced this pull request May 17, 2024

Change log for coq#19040

07606ed

herbelin mentioned this pull request May 17, 2024

Syntax errors reported at a wrong location in CoqIDE #18682

Closed

herbelin marked this pull request as ready for review May 17, 2024 14:34

herbelin requested a review from a team as a code owner May 17, 2024 14:34

herbelin added a commit to herbelin/github-coq that referenced this pull request May 17, 2024

Change log for coq#19040

04b2969

herbelin force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from 07606ed to 04b2969 Compare May 17, 2024 14:34

herbelin requested a review from a team as a code owner May 17, 2024 14:34

herbelin commented May 17, 2024

View reviewed changes

ejgallego approved these changes May 26, 2024

View reviewed changes

SkySkimmer reviewed Jun 2, 2024

View reviewed changes

jfehrle force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from eca53b9 to be70563 Compare June 2, 2024 17:47

jfehrle approved these changes Jun 2, 2024

View reviewed changes

jfehrle self-assigned this Jun 2, 2024

coqbot-app bot removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 2, 2024

jfehrle modified the milestones: 8.19.2, 8.20+rc1 Jun 2, 2024

jfehrle mentioned this pull request Jun 3, 2024

Fix mislocated tooltips when UTF-8 characters are present (workaround lablgtk bug) #19137

Merged

2 tasks

Workaround for lablgtk bug

3db6599

jfehrle force-pushed the master+fix-xmlprotocol-loc-line_nb-fail branch from be70563 to 3db6599 Compare June 4, 2024 05:28

coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 4, 2024

jfehrle removed the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Jun 5, 2024

coqbot-app bot merged commit 983721d into coq:master Jun 5, 2024
6 checks passed

FissoreD mentioned this pull request Jun 27, 2024

Vscoq legacy broken #19288

Closed

thery added a commit to thery/vscoq that referenced this pull request Jun 28, 2024

fix coq/coq#19040

4ee7f59

thery added a commit to coq/vscoq that referenced this pull request Jun 28, 2024

[vscoq1] fix coq/coq#19040 (#811)

664cf04

fix coq/coq#19040

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

herbelin commented May 17, 2024 •

edited by jfehrle

Loading

herbelin May 17, 2024

jfehrle May 18, 2024 •

edited

Loading

ppedrot commented May 17, 2024

herbelin commented May 17, 2024

jfehrle commented May 17, 2024

herbelin commented May 19, 2024

herbelin commented May 19, 2024 •

edited

Loading

herbelin commented May 19, 2024 •

edited

Loading

jfehrle commented May 20, 2024 •

edited

Loading

herbelin commented May 26, 2024

ejgallego commented May 26, 2024

herbelin commented May 26, 2024 •

edited

Loading

ejgallego left a comment

ejgallego commented May 26, 2024

jfehrle commented May 27, 2024 •

edited

Loading

jfehrle commented May 27, 2024

herbelin commented May 27, 2024

ejgallego commented May 27, 2024

jfehrle commented Jun 2, 2024

SkySkimmer Jun 2, 2024

herbelin Jun 2, 2024 •

edited

Loading

jfehrle Jun 2, 2024

herbelin commented Jun 2, 2024

jfehrle left a comment

jfehrle commented Jun 2, 2024

jfehrle commented Jun 2, 2024

gares commented Jun 3, 2024

jfehrle commented Jun 3, 2024

jfehrle commented Jun 4, 2024 •

edited

Loading

jfehrle commented Jun 5, 2024

coqbot-app bot commented Jun 5, 2024

jfehrle commented Jun 5, 2024

Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040

Conversation

herbelin commented May 17, 2024 • edited by jfehrle Loading

herbelin May 17, 2024

Choose a reason for hiding this comment

jfehrle May 18, 2024 • edited Loading

Choose a reason for hiding this comment

ppedrot commented May 17, 2024

herbelin commented May 17, 2024

jfehrle commented May 17, 2024

herbelin commented May 19, 2024

herbelin commented May 19, 2024 • edited Loading

herbelin commented May 19, 2024 • edited Loading

jfehrle commented May 20, 2024 • edited Loading

herbelin commented May 26, 2024

ejgallego commented May 26, 2024

herbelin commented May 26, 2024 • edited Loading

ejgallego left a comment

Choose a reason for hiding this comment

ejgallego commented May 26, 2024

jfehrle commented May 27, 2024 • edited Loading

jfehrle commented May 27, 2024

herbelin commented May 27, 2024

ejgallego commented May 27, 2024

jfehrle commented Jun 2, 2024

SkySkimmer Jun 2, 2024

Choose a reason for hiding this comment

herbelin Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

jfehrle Jun 2, 2024

Choose a reason for hiding this comment

herbelin commented Jun 2, 2024

jfehrle left a comment

Choose a reason for hiding this comment

jfehrle commented Jun 2, 2024

jfehrle commented Jun 2, 2024

gares commented Jun 3, 2024

jfehrle commented Jun 3, 2024

jfehrle commented Jun 4, 2024 • edited Loading

jfehrle commented Jun 5, 2024

coqbot-app bot commented Jun 5, 2024

jfehrle commented Jun 5, 2024

herbelin commented May 17, 2024 •

edited by jfehrle

Loading

jfehrle May 18, 2024 •

edited

Loading

herbelin commented May 19, 2024 •

edited

Loading

herbelin commented May 19, 2024 •

edited

Loading

jfehrle commented May 20, 2024 •

edited

Loading

herbelin commented May 26, 2024 •

edited

Loading

jfehrle commented May 27, 2024 •

edited

Loading

herbelin Jun 2, 2024 •

edited

Loading

jfehrle commented Jun 4, 2024 •

edited

Loading