-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix passing line number of errors in xml protocol, causing ill-located or non-located errors, notably in the presence of utf8 characters #19040
Conversation
6acf253
to
f5b4514
Compare
07606ed
to
04b2969
Compare
@@ -9,7 +9,7 @@ | |||
(************************************************************************) | |||
|
|||
(** Protocol version of this file. This is the date of the last modification. *) | |||
let protocol_version = "20230413" | |||
let protocol_version = "20240517" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular procedure to advertise protocol changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may break coqTail and vsCoq (not vsCoq2). Ideally they could make changes now so those tools still work on master as well as in 8.20. Reach out to @whonore for coqTail. IIRC @maximedenes tweaked vsCoq.
That's like, the fourth time or so we try to fix this? |
@jfehrle tried something the details of which I don't know well, then @ejgallego tried to rely on the byte offset provided by gtk but this required passing the line number across the protocol, what he did in 3a3de8f but there was still a place in serialization which was not passing a line number. So, to me, the current solution (passing the line number and relying on gtk's byte-offset computation rather than recomputing a char-offset by counting ourselves how many utf8 bytes are used) seems to be pretty reasonable. At least, it is not like we are not knowing where we are going. It is more that we knew but made it by steps. |
Aside from the line number issue, can you give an example of the UTF-8 problem you mention fixing? It's a odd to share a Loc.t because the server side uses character offsets (in several fields) while the client side (GTK) only works with byte offsets. Seems like this is likely to cause confusion. IIRC there are at least 4 different cases to test for error messages and tooltips. In particular, displaying errors correctly when the user modifies a file that's being processed asynchronously. I pointed out problems (based on testing the code by hand) with multiple versions of @ejgallego's changes, including the final one. If you're interested in trying the test cases, I'll see if I find my notes on that. I had many comments in #17382. |
Actually no :(. I thought the PR was fixing the UTF-8 shift but it fixes only the line number issue (as in #18682). (My own test had the error on a line w/o UTF-8 so I did not realize it that it was fixing only the line and not the offset.) |
Actually, there seems to be a lablgtk bug! A bad copy-paste at this line of the ocaml-C wrapper So, I guess we indeed have to rely in the meantime on our own byte-to-char conversion. |
Are you meaning the opposite, i.e. bytes in Coq and chars in GTK? Would you know how to insert a "b2c" in the current code? One that works from the beginning of the current line and not from the beginning of the whole sentence? PS: For the record, a short example with shift-by-1 locating is e.g.: Definition α := x. |
Indeed.
Yes. The code can be short and efficient, but it takes considerable care to cover all the cases and shouldn't be rushed. @ejgallego submitted 2 PRs related to this some months apart. The first broke the mechanism completely and the second didn't fix all the damage. I'd probably start with the code from before both of those PRs. The code before both PRs saved the byte offset of the beginning of the sentence. Then the byte offset -> char offset conversion only has to examine the sentence, not the whole buffer (which caused performance issues). An additional wrinkle is that in async mode (e.g. if a proof fails), the buffer offset of a sentence can change if the user can edits a failed proof, which will change the buffer offset of subsequent sentences. There is a way to adjust for this. But let's fix all the issues this time. WDYT? Also, do you know when will 8.20 will freeze? |
Sorry, I was expected an answer but did not realize that you wrote.
There is thread here: https://coq.zulipchat.com/#narrow/stream/237656-Coq-devs-.26-plugin-devs/topic/8.2E20 . The current plan is:
I don't know the details of all the work done by everyone, but regarding the passing of the full loc, line included, as well as regarding the use of the native GTK support for byte offset ( I just tested the PR with the development version of lablgtk and the example Otherwise, I think we can go with the byte-to-char translation from the beginning of the sentence (or maybe from the beginning of the line). |
@herbelin , indeed testing this stuff is hard and I didn't realize of the real problem until later (which also impacted coq-lsp, which I have to say has no issues with locations) In particular what I didn't realize is that the lexer had been totally broken w.r.t. to positions by the debugger PR (even if I requested we keep a bug, we shouldn't have merged the code in that state). See #16978 for a summary of the problems. So the hole was not in the protocol but in the lexer. @herbelin this change is good, and I tried in some other PR; I did close it tho as I didn't have the cycles to adapt 3rd party users of the protocol, as the change doesn't seem backwards compatible. I didn't notice the labgtk bug, that's cool. I am at loss at how to react to @jfehrle comments, other than to say that they are just basically technically and historically inaccurate; as we explained in the call. The curious reader can dig in some of the relevant PRs as all the info is there, including a summary. Contrary to what Jim believes, the code between the merge of the debugger PR and #16978 is just too broken to even make coq-lsp with loads of hacks (we ported to 8.11 so the code prior was way less buggy) So I strongly recommend no to go back! |
I tested @ppedrot's fail/wait 20 example and it seems to have the underlining of I propose that we already merge this PR and that any regular coqide user install the development version of lablgtk, so that we can daily test further. To install lablgtk, I did:
Note that I needed the following patch for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thanks a lot @herbelin
I tried to do this in #17391 , but I think I didn't have the time to update the clients, sorry I didn't manage to point to you to this direction. The fix of lablgtk is impressive.
By the way, you can also vendor lablgtk (just do a symlink in Coq's tree), which is convenient for quick testing.
@herbelin , you may find some bits in #17391 interesting, in case you'd like to add them here, also Jim's tooltip test case (which is pretty good) is there I believe.
@herbelin the standard method for positions these days is indeed line / column , encoded in very different ways tho, due to windows and unicode (actually I misinterpreted the LSP spec in this sense, thanks to Léo Stefanesco who corrected this mistake, it took quite a bit of effort to get unicode working well but now we have a superb solution IMO) However, offsets (in this case in byte form), even if less standard, are very very useful, so I'm so glad for Coq's Many advanced use cases and editors, such as CodeMirror , do require offsets too! But yes, but GTK editing, the right choice is to use line numbers + offset adjustement, and store metadata in protocol-level format (which can vary between platforms) |
It would be good to fix everything at once so we don't (perhaps) change the protocol twice. But there's also the argument that users would appreciate an incremental improvement for 8.20. I can try more test cases later (it's getting a little late). You don't plan to do anything about the
That's fine if you don't allow the user to edit the buffer while Coq is processing it. But we do. |
I confirm (and I have no idea a priori of how to fix it).
Hum, what to do? Could the computation once made from the beginning of the sentence be used to compute the line shift from the beginning of the sentence?
This works with the fixed version of lablgtk. So the question is about whether we want to have a solution (and have the courage to implement it) even without the lablgtk fix. In any case, this does not have to be in this PR. What the PR fixes is practically important (as in #18682 but imagine when there are more lines involved). The wrong location (and sometimes even no location at all, because computed out of the text) made the use of coqide really more difficult to me. (If you don't know what to expect, maybe it is not so bad, but when you know that there should be a location and you have to look through e.g. 10 lines to understand from were some typing error could come, it is frustrating.). So, for me, the priority is to merge the PR and appreciate the incremental improvement for 8.20. The unicode shift due to the lablgkt bug is not elegant but somehow less critical in practice because you can still reconstruct where the underlining was intended.
Sorry, I don't understand. A symlink to what? In any case, I'm glad that we are now converging to a resolution of most location problems!
Maybe the other known users of the XML protocol have an opinion here (CoqTail, VsCode 1)? |
What the rest of the code does is to adjust the locations, I guess this list is not updated? I can have a look and find the concrete code that does that for tooltips.
To lablgtk sources.
Why is the choice of offsets vs line / column relevant here? In both cases, the locations need adjustment, I don't see how line / col cannot be also adjusted (in fact it could be even more efficient, that's why text editors prefer line/col coordinates) |
coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.
coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.
Yes, though we should check that it would improve 8.19. |
ide/coqide/ideutils.ml
Outdated
@@ -174,6 +174,19 @@ let ulen uni_ch = | |||
else if uni_ch < 0x10000 then 3 | |||
else 4 | |||
|
|||
(* workaround for lablgtk bug up to version 2.18.13 *) | |||
(* replaces: buffer#get_iter_at_byte ~line index *) | |||
(* see https://github.com/garrigue/lablgtk/pull/175 *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem related, garrigue/lablgtk#175 is about copy_string_check AFAICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 181 (and already fixed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
eca53b9
to
be70563
Compare
It improves at least the locating of the error in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge soon if there are no further comments.
I'll submit the third commit to 8.19.2 separately.
@coqbot run full ci |
@maximedenes @rtetley Do you know who, if anyone, maintains vsCoq1? This PR will likely break it since it changes the XML protocol a little. We'd like to give a heads up and get a clean fix. |
@thery for vscoq1 maintenance |
@thery This PR makes a small modification to the XML protocol that vsCoq1 relies on. Will you or someone else be able to update vsCoq1 for use with Coq 8.20? Thanks. |
be70563
to
3db6599
Compare
@coqbot: merge now |
@jfehrle: You cannot merge this PR because:
|
@coqbot: merge now |
coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.
coq/coq#19040 changes the way error locations are reported in the XML protocol. Specifically, it replaces the `loc_s` and `loc_e` attributes of the `fail` result with an optional `Loc.t` field. If I understand correctly, the `start` and `stop` fields contain the same byte offsets as the old `loc_s` and `loc_e`, so it's enough to just extract and return those.
Fix was actually very easy and will make life in xml-based UIs much more comfortable.
Fixes #18682, #19139.