Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative codespan-reporting style formatting? #11

Open
ruifengx opened this issue Sep 1, 2022 · 58 comments · May be fixed by #14
Open

Alternative codespan-reporting style formatting? #11

ruifengx opened this issue Sep 1, 2022 · 58 comments · May be fixed by #14
Labels
enhancement New feature or request

Comments

@ruifengx
Copy link

ruifengx commented Sep 1, 2022

Thanks for your great library!

I use both Rust and Haskell, and have learned about error reporting libraries like ariadne and codespan-reporting for a while. Previously when working on a Rust project, I chose codespan-reporting because I preferred its formatting (formatting of ariadne is also beautiful, but a bit too fancy for me), given their interfaces are pretty similar. Is it possible to extend this library to have an alternative codespan-reporting-style formatting for the diagnostic messages? Or is there some design space to expose an API for custom report formatting?

FYI, here is a comparison of the said two styles (BTW it is amusing to see a more Haskell-like syntax for illustration in codespan-reporting the Rust library but a more Rust-like style here):

Name Illustration
(current) ariadne-style
codespan-reporting-style codespan_reporting_preview

If this is deemed non-trivial, I am willing to work on this, in that case would you please provide some guidance?

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

One of the major setbacks would be the need to expose the internal definitions of e.g. reports, diagnostics, etc. (mainly all that is present in X.Internal modules which isn't exported by the X modules)
Either that or allowing to freely format markers (which seem like most of the differences there to me).

However, if we allow retrieving files from a given Diagnostic, then it would be quite “easy” to change the overall style of reports⁽¹⁾.
But only if writing a function Diagnostic msg -> Doc Annotation is not that much of a burden (which, believe me, it is).

If this is a feature that is very much wanted, then there are a few changes that will need to be done:

  1. If we want the layout to be 100% customisable:
    • First of all, the printDiagnostic and prettyDiagnostic functions will need to take a “layout” function (of type Pretty msg => Diagnostic msg -> Doc Annotation) which will be in charge of creating the to-be-rendered Document
    • Once this is done, we need to refactor the current layout (under, e.g., a Error.Diagnose.Layout module). Then copy all the layouting work within the module Error.Diagnose.Report.Internal to that module (or a submodule)
    • Afterwards, you should be able to freely define your own layout (we may create a simply type alias type Layout msg = Diagnostic msg -> Doc Annotation for shortness' sake) but this is the hardest part, as it means that all the internal logic must be reimplemented for your new layout (which is about 80-90% of the whole codebase, for the current layout). To facilitate creating new layouts, we may also expose helper functions which would compute extra information (for example, marker column offset in spaces or unicode codepoints)
  2. if we only want to only minor tweaking of the overall layout (such as marker style):
    • We should be able to pass very simple functions (within records, for example) which take specific information (for a marker, its line, column in unicode codepoints, computed length in number of spaces, and its style ─ error, warning, etc. ─) and output a part of a Doc Annotation

I am much less confident in the 2nd alternative, even thought the 1st one is actually way harder to implement.
So I am not quite sure what's the correct way to go.
What do you think about this? Are there any other alternatives that I did not mention?


⁽¹⁾: we are already able to retrieve Reports from Diagnostics and Markers, Notes, etc. from Reports. The only missing piece of data is files to be included in the report itself.

@Mesabloo Mesabloo added the enhancement New feature or request label Sep 1, 2022
@ruifengx
Copy link
Author

ruifengx commented Sep 1, 2022

Thanks for your reply! I am still learning my way in the code base, so I cannot comment about the proposed approaches yet. Actually, after submitting this issue, I pulled your repo down on my local Windows machine and encountered a link error, for which I believe wcwidth is to blame. There is already an unpublished fix in their GitHub repo (solidsnack/wcwidth#3), but it involves a package named rset which is not in stack LTS and also not updated for years. I believe the fix there could be improved by replacing the linear search logic (thus also get rid of the dependency on rset) with a binary search based on arrays, so I am currently preparing a PR for that package. I will come back in a few days after I finish that PR and get familiar with the code here.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

I can take a look at how to structure that (and the best way to do it, although I may have an idea already) and maybe start working on it in a separate branch if you want me to. :)

As for the bug with wcwidth, I thought that the aforementioned fix was already published on Hackage, but looking back at the last update, it is not. This is definitely worthwhile to also work on this in order to make Diagnose usable on Windows.

@ruifengx
Copy link
Author

ruifengx commented Sep 1, 2022

I have just uploaded my changes to wcwidth here. I removed the dependency on rset and replaced it with a binary search based on array. I also added a test suite to verify my implementation is correct. It seems that the owner of wcwidth now has little interest in Haskell, and that is probably the reason why the Hackage package is still not updated.

And regarding Windows support, I have a bad news to tell: GHC's support for Unicode output through standard IO on Windows in general is still messy. I see a MR on GitLab that claims to fix it, but I cannot find the RTS switch --io-manager=native in GHC 9.0.2 (which is the latest GHC in stack LTS). A plain run of the test suite here results in a crash, and with with-utf8 the program stops crashing, but shows some question marks in the output:
image

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

Alright so there's basically nothing we can do regarding the Unicode + Windows combination, other than proposing alternative ASCII layouts (which is already the case for the default layout, although it looks less fancy that way).
Not sure what to do about that issue. Haskell support on Windows is still rather poor, from all those years being mainly a Unix-favored language (this should be changing as we are speaking, but turns out this isn't quite fast-paced).

@ruifengx
Copy link
Author

ruifengx commented Sep 1, 2022

Sorry, I should have experimented more before posting the last comment. Stack did not recognize GHCRTS="--io-manager=native", so I assumed GHC 9.0.2 did not have this. But it turned out both GHCi (with ghci +RTS --io-manager=native) and the test-suite (I found the executable manually and fed it the same RTS argument) is working correctly (at least in my settings):
image

So GHC's Windows support is changing, and the current progress in fact looks quite promising.

(BTW, I should be using the latest release of stack, but it reported that the cabal file in this repo was generated by a newer version of hpack. Just out of curiosity, are you using a dev-build in your workflow?)

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

I absolutely hate the fact that everything is misaligned, and it looks like Windows is trying its best to render the unicode box characters but absolutely fails to do so with the correct width.
But it's better than nothing I guess.

(BTW, I should be using the latest release of stack, but it reported that the cabal file in this repo was generated by a newer version of hpack. Just out of curiosity, are you using a dev-build in your workflow?)

I am not, so this is a bit weird.

$ stack --version
2.7.5 x86_64 hpack-0.34.7

But I don't remember which version I used to generate the .cabal file (might have been stack 2.7.3 at the time).

@ruifengx
Copy link
Author

ruifengx commented Sep 1, 2022

This is what I have, yet another peculiarity I guess:

$ stack --version
Version 2.7.5, Git revision ba147e6f59b2da75b1beb98b1888cce97f7032b1 x86_64 hpack-0.34.4

So stack is actually shipping officially the same version 2.7.5 with different hpack on different platforms ...

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

This shouldn't be a problem at all if your version of hpack managed to work correctly. :)

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 1, 2022

I started refactoring and moving code as per the first part of my first suggestion, in the branch custom-layouts.
For now, this is a simple move of the code which originally resided in the Error.Diagnose.Report.Internal package.
Note that I have isolated the ariadne-like layout inside its own package, in order to allow the user to choose which layout to use (if any at all, this is unnecessary when only the JSON backend is interesting).

Documentation has not been updated at all (there will be A LOT of places to update it), but all the rendering tests pass and give the same results as before.

As of now, we should try to see which parts of the code for the ariadne layout would be better off inside the Error.Diagnose.Layout module as helper functions (for example, the function to retrieve and colorize a line of code named getLine_).
I think this should be easier if multiple layouts are defined concurrently.

Also note that tests have been put in the diagnose-ariadne subpackage but they don't really belong there.
I'll move them later to somewhere else.
They have been moved to a global package within the repository.

@ruifengx
Copy link
Author

ruifengx commented Sep 2, 2022

I have just submitted solidsnack/wcwidth#6, but I am not very positive that wcwidth will be updated soon on Hackage. Given that the code there is just a bunch of Unicode data plus searching logic (which I have just reimplemented), I believe you can also copy-paste the relevant code here. (Especially so if you are okay with always using the Haskell implementation of wcwidth function.)

As of now, we should try to see which parts of the code for the ariadne layout would be better off inside the Error.Diagnose.Layout module as helper functions (for example, the function to retrieve and colorize a line of code named getLine_).

I have just started reading your code in the custom-layout branch. I guess I will try to write a prototype of codespan-reporting style layout based on the current ariadne layout, and then we will see which parts are common.

This shouldn't be a problem at all if your version of hpack managed to work correctly. :)

Actually stack refused to regenerate the cabal file if the local hpack version is older than the one that was used to generated the current copy of cabal file. But I guess that is okay for now, I can just avoid editing package.yaml.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 2, 2022

I have just submitted solidsnack/wcwidth#6, but I am not very positive that wcwidth will be updated soon on Hackage. Given that the code there is just a bunch of Unicode data plus searching logic (which I have just reimplemented), I believe you can also copy-paste the relevant code here. (Especially so if you are okay with always using the Haskell implementation of wcwidth function.)

If the pure Haskell implementation is fully compliant with the “original” C version, I am not that bothered by including it inside this library. It'd be much better though if this could be included within a new Hackage release of the wcwidth library, for reusability's sake.
Maybe we could somehow "hijack" the package by pushing updates ourselves (as long as the original author is okay with it)?
Not sure how all this works on Hackage.

Actually stack refused to regenerate the cabal file if the local hpack version is older than the one that was used to generated the current copy of cabal file. But I guess that is okay for now, I can just avoid editing package.yaml.

Ah! I did not know that. You should not have to change any package.yaml in the two libraries (maybe only the tests, but you could just delete the diagnose-tests.cabal if you need to regenerate it, I don't think this is a real problem for a package not meant to be pushed anywhere).

I have just started reading your code in the custom-layout branch. I guess I will try to write a prototype of codespan-reporting style layout based on the current ariadne layout, and then we will see which parts are common.

I also had the idea to create a few more layouts (e.g. GCC-like) just for the sake of it, sort of to demonstrate the capabilities of this library, and perhaps give more choice regarding the output format.
I think this will also help seeing which parts will need to be common.

For now, if you need something from my code, feel free to copy-paste it, as it will make it easier to see what's in common later. :)

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 3, 2022

In the meantime, I went ahead and implemented a GCC-style layout (and style) for the library.
Currently, this looks like this (this uses the same error as the one from the table of the issue):

  • with Unicode characters:
    image
  • with plain ASCII characters:
    image

I'll wait a bit to push it onto the branch, as I have done some changes to how the code is organised in other modules too (such as Error.Diagnose.Style).
Basically, I moved all the annotation generation and handling within specific Error.Diagnose.Style.XXX (where XXX is e.g. Ariadne or GCC) to where the layout are created.
This allows a better handling of annotations: for example, the GCC layout does not use the original annotation RuleColor (as there is no rule), so it uses its own annotation data type (where other minor tweaks have been performed too).
Thanks to that, the library is basically plug and play (given that the layout you want is already done) and can use different styles for different layouts (given that the style is bundled with the layout, but if the annotation type is exposed, one can also create its own style).

@ruifengx
Copy link
Author

ruifengx commented Sep 3, 2022

for example, the GCC layout does not use the original annotation RuleColor (as there is no rule), so it uses its own annotation data type

I guess similar things happen in codespan-reporting style, too. Though of course most annotations should be reusable, so after I finish debugging my implementation I should probably come up with a way of code reuse other than copy-paste-modify.

I have been fighting with stack and GHC for debugging for a while, because the stack I use was built with GHC 8.10.4 and therefore has no support for --io-manager=native. Currently I go with producing the test suite executable and manually invoke them with appropriate RTS parameters. In the mean time I am also compiling stack with GHC 9, so either way I should hopefully be able to finish my prototype and share with you on GitHub in one or two days.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 3, 2022

I guess similar things happen in codespan-reporting style, too. Though of course most annotations should be reusable, so after I finish debugging my implementation I should probably come up with a way of code reuse other than copy-paste-modify.

Unfortunately, I don't think that reusability is doable here.
Annotations are pretty much dependent on the layout used: what's the point of e.g. RuleColor for a layout without rules? or any other annotation which does not make sense for whatever reason? and how to handle additional/modifications of annotations that do not make sense in other layouts (I have examples for these, such as the Angular 9 layout which changes the background color of line numbers)?
If we want to add or modify annotations on a per-layout basis, we shouldn't have to change the core of the library (basically Error.Diagnose.Style), hence why I moved them directly in each layout package.
This will introduce some sort of redundancy (well, not that much because everybody names their annotations however they want), I agree with that, but I don't think we can have only a common few annotations to describe every possible rendering layout.

EDIT: here are the changes I made related to the styling:

  • In Error.Diagnose.Style: (replaced Annotation with this)
type Style ann = IsAnnotation ann => Doc ann -> Doc AnsiStyle

-- | The class of annotation, allowing to map each to a given color style.
--
--   Every annotation used in a 'Style' must implemented this typeclass.
class IsAnnotation ann where
  -- | To be used with 'reAnnotate'.
  --
  --   Transforms the custom annotations into color annotations as described by 'AnsiStyle'.
  mkColor :: ann -> AnsiStyle
  • In Error.Diagnose.Diagnostic.Internal: (changed the types to use IsAnnotation)
type Layout msg ann = (IsAnnotation ann, Pretty msg) => Bool -> Int -> Diagnostic msg -> Doc ann

-- | Prints a 'Diagnostic' onto a specific 'Handle'.
printDiagnostic ::
  (MonadIO m, Pretty msg, IsAnnotation ann) =>
  -- | The handle onto which to output the diagnostic.
  Handle ->
  -- | Should we print with unicode characters?
  Bool ->
  -- | 'False' to disable colors.
  Bool ->
  -- | The number of spaces each TAB character will span.
  Int ->
  -- | The style in which to output the diagnostic.
  Style ann ->
  -- | The layout to use to render the diagnostic.
  Layout msg ann ->
  -- | The diagnostic to output.
  Diagnostic msg ->
  m ()
printDiagnostic handle withUnicode withColors tabSize style layout diag =
  liftIO $ hPutDoc handle ((if withColors then style else unAnnotate) $ layout withUnicode tabSize diag)
{-# INLINE printDiagnostic #-}

I can push them whenever you need them.

so either way I should hopefully be able to finish my prototype and share with you on GitHub in one or two days.

Don't worry at all, take all the time you need!
There's no deadline for this at all. :)

I have been fighting with stack and GHC for debugging for a while, because the stack I use was built with GHC 8.10.4 and therefore has no support for --io-manager=native.

By the way, do you think this is worth mentioning in the documentation?
All the Unicode stuff on Windows, like the with-utf8 package, what to do with GHC 9+ (and possibly other stuff I overlooked, like the fact that you should call setLocale LC_ALL before using wcwidth), etc.

@ruifengx
Copy link
Author

ruifengx commented Sep 3, 2022

By the way, do you think this is worth mentioning in the documentation?

Yes, I think that would be very helpful for users of this library, especially Haskell newcomers or people who have not explicitly dealt with Unicode issues in the past. It could be in the README, the package documentation, or even a stand-alone tutorial if you wish.

All the Unicode stuff on Windows, like the with-utf8 package, what to do with GHC 9+ (and possibly other stuff I overlooked, like the fact that you should call setLocale LC_ALL before using wcwidth), etc.

In my experience, with-utf8 is especially helpful to prevent crashing; I believe by using the Main.Utf8.withUtf8 wrapper on main, the locale of standard I/O handles (stdin, stdout, stderr) are set to lenient versions, so they will not fail for invalid characters (that is, characters which cannot be encoded in the target encoding). However, it can still be overly defensive where some Unicode characters encodable in target encoding also get replaced by ? (like I showed in previous screenshots), possibly due to the fact that GHC is still using iconv (amd it somehow failed to detect the target encoding correctly) 1.

On the other hand, at least according to the tests I have done these days, the RTS option --io-manager=native (and yes, it is only available in GHC 9+) should fix the Unicode problem on its own, both preventing the crash and getting the correct output without replacements. Yet I would still use with-utf8 because I personally prefer to enforce UTF-8 encoding, and I believe you can't be too cautious on such things.

For wcwidth, I actually did not know about the LC_ALL thing. But the Haskell implementation (both the previous one and the one in my PR) should not depend on the locale, because it unconditionally uses the embedded Unicode database.

I can elaborate on these points later if you wish to include them in the doc.

Footnotes

  1. This is according to a GHC issue I read a while ago, which might be outdated now.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 3, 2022

For wcwidth, I actually did not know about the LC_ALL thing. But the Haskell implementation (both the previous one and the one in my PR) should not depend on the locale, because it unconditionally uses the embedded Unicode database.

This is included in the documentation, but is needed only for the FFI wcwidth (I guess it would not work with some locales like LC_C, or at least not correctly).
This is indeed unneeded if it uses an embedded Unicode database (which is what you PRed there, which was already present but unpublished and worsely coded).

I can elaborate on these points later if you wish to include them in the doc.

I pretty much would like to include them in the docs, yes.
This would save some time for people who never got Unicode issues and also myself as I won't have to deal with issues about that.
A tutorial is already present in the Error.Diagnose module, and there's also a FAQ section in there. I guess we could insert new questions (with the Haskell errors when trying to print unicode characters) and answer them there!


I'm curious because I didn't know one could do that, but how did you make the little linkable footer at the end of your comment?

@ruifengx
Copy link
Author

ruifengx commented Sep 3, 2022

I'm curious because I didn't know one could do that, but how did you make the little linkable footer at the end of your comment?

I believe it is part of the extended Markdown syntax: use [^1] or [^longlabel] for where the marker should be placed, and write later (in a separate paragraph) [^1]: the actual footnote text. It doesn't matter what label you use (as long as they don't clash), nor where you place the footnote text: the marker will always be numbers starting from 1, and the footnotes will always go to the end of the comment.

Here is a GitHub blog that announced this syntax is supported on GitHub.

@ruifengx
Copy link
Author

ruifengx commented Sep 4, 2022

A quick update on wcwidth. It turns out the Haskell implementation is incorrect from the very beginning: all control characters (and according to the implementation, "control characters" also include the space character) are reported to have width -1 (perhaps to indicate an error). So this was the source of the "everything is misaligned" screenshot, instead of Windows Terminal. Now I believe I will have to validate the logic of that implementation myself and give it a patch or rewrite.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 4, 2022

I see. That's a bit weird but I never really took a look at the Haskell implementation of wcwidth so I guess it makes sense.
The negative return codes should probably be handled in the getLine_ function (as of now, we don't, which causes some trouble when trying to print control characters because they also have a width of -1 according to the C implementation of wcwidth, if I understand the specification correctly).

RETURN VALUE
The wcwidth() function shall either return 0 (if wc is a null wide-character code), or return the number of column positions to be occupied by the wide-character code wc, or return -1 (if wc does not correspond to a printable wide-character code).

I hope you'll be able to figure all this out. Unfortunately, I don't see the original maintainer giving help on this as they don't seem to use Haskell anymore, but I hope you could take ownership of it? Nobody else seems to be quite interested in it though...


I have just discovered that the original pure Haskell implementation was based on this pure Python implementation. Maybe there's some stuff you could gather from it? (checking from the issue tab, there's only a few minor bugs, but it should work for most Unicode standard versions)

@ruifengx
Copy link
Author

ruifengx commented Sep 4, 2022

I was also searching on this topic, and found out there once was a char::width implementation in Rust's libstd, but it did not make it into stable Rust. This char::width and the Python implementation you mentioned both are based on Markus Kuhn's implementation (which dates back to 2007, Unicode 5.0). I actually quite like the Rust implementation, especially its signature: it returns an Option, which forces the user to explicitly handle the control characters.

Unfortunately, I don't see the original maintainer giving help on this as they don't seem to use Haskell anymore, but I hope you could take ownership of it? Nobody else seems to be quite interested in it though...

Yes, I also don't expect to receive help from the original maintainer. To be fair, the original package was essentially a thin wrapper around the C function wcwidth, so they probably did not expect the mess at all. Although I am not sure about taking ownership, partly because there is only one dependency on wcwidth on Stackage, and also because I am not familiar to maintaining packages on Hackage at all. Anyway, I will first prepare a patch. After that we can decide whether to include the implementation here or try to update the wcwidth package on Hackage.

And BTW, the bug for space character is due to these line:
https://github.com/solidsnack/wcwidth/blob/900e463d9472aba3620198d2f32987b7402be5b9/Data/Char/WCWidthHaskell.hs#L413-L417
The ranges is inclusive on both ends, because that is how the data is stored in UCD. But here apparently the 32 was intended to be exclusive (due to the use of < in the C implementation).

I will take some time to also check for other bugs before resubmitting the PR.

@ruifengx
Copy link
Author

ruifengx commented Sep 4, 2022

And regarding the width of control characters, I believe it is perfectly okay to just special case for tab, backspace, etc. and assume the width is 1 for the rest. Most terminal emulators also don't handle the control characters, and display them as solid rectangles (or whatever the "missing" glyph is in the font), which usually have width 1.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 4, 2022

I actually quite like the Rust implementation, especially its signature: it returns an Option, which forces the user to explicitly handle the control characters.

I have to admit that I also quite like that! Definitely better.

Although I am not sure about taking ownership, partly because there is only one dependency on wcwidth on Stackage, and also because I am not familiar to maintaining packages on Hackage at all.

I think that the dependency count accounts only for package within stackage snapshots. For example, diagnose isn't, yet it depends on wcwidth (I think that chapelure also depends on it? Or they only planned on depending on it as mentioned by one of their maintainers in #1 (comment)). So I'm not quite sure how accurate this is.

And BTW, the bug for space character is due to these line:
https://github.com/solidsnack/wcwidth/blob/900e463d9472aba3620198d2f32987b7402be5b9/Data/Char/WCWidthHaskell.hs#L413-L417
The ranges is inclusive on both ends, because that is how the data is stored in UCD. But here apparently the 32 was intended to be exclusive (due to the use of < in the C implementation).

I quickly took a look at your new implementation, and I was ready to leave a comment exactly at this exact line in your code, but I simply could not understand whether your binary search was inclusive or exclusive on the last number of each range.
But I'm happy that you found the issue!

And regarding the width of control characters, I believe it is perfectly okay to just special case for tab, backspace, etc. and assume the width is 1 for the rest. Most terminal emulators also don't handle the control characters, and display them as solid rectangles (or whatever the "missing" glyph is in the font), which usually have width 1.

We already had a special case for the tab characters (so that you as the user choose how many spaces are used to render a tab character). I'm worried though that there will be many special cases for most of control characters. But if they are rendered anyway as single cell characters, then I guess it's fine to consider most of them as having a width of 1.

@solidsnack
Copy link

solidsnack commented Sep 4, 2022

For the most part, WCWidth relies on the native wcwidth implementation, from wchar.h. On Linux, Mac OS, FreeBSD, &c, the WCWidthHaskell implementation is simply not used. Reading over @ruifengx's post, I see that he uses Windows; and this is the one platform where the Haskell implementation is used.

There is a mistake, on WCWidthHaskell.hs:415 -- 32 should be 31. This leads to ASCII whitespace being treated like a control character. This can be fixed easily.

@Mesabloo is correct that, per POSIX, wcwidth is supposed to return -1 for a wide variety of characters.

Please find attached, the ranges pulled from the native implementation on a recent Macintosh.

ranges.txt

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 4, 2022

Oh hi there!

There is a mistake, on WCWidthHaskell.hs:415 -- 32 should be 31. This leads to ASCII whitespace being treated like a control character. This can be fixed easily.

This is pretty much what @ruifengx figured in #11 (comment). I believe that a fix will be coming your way in the next few days (along with a rewrite of the pure Haskell implementation for a better/more efficient search strategy).

@Mesabloo is correct that, per POSIX, wcwidth is supposed to return -1 for a wide variety of characters.

However, would it be ok to provide a more “Haskell-ish” way of handling -1 (which in the C world mostly mean “errors”) via a Maybe Int? Or we could add another function (like wcwidthMay :: Char -> Maybe Int which would act like wcwidth but return Nothing instead of -1) for this purpose, while still exposing the POSIX wcwidth?
What do you think about it @solidsnack?

@ruifengx
Copy link
Author

It took me much longer than I expected, but anyway I have just finished a prototype of the codespan-reporting style. It definitely needs more fine-tuning, and I will need some time to merge your changes here. But the good news is it is working on all the test cases. Here are some screenshots:

Test Case Screenshot
Single-line markers Single-line markers
Multi-line markers Multi-line markers
Unicode & Tab Unicode
Source line omitting Source line omitting

Currently the placement of multi-line markers is not ideal:

Current Improvement (not yet)
Current Improved

I have a plan of improvement and will rewrite this part soon.


Below are some thoughts during the coding these days:

  1. In codespan-reporting the crate, there are five different levels of severity. I coded the Layout in such a way that all the five are allowed, but only Error and Warning are actually used:

    data Severity
      = Bug
      | Error   -- from `Err`
      | Warning -- from `Warn`
      | Help
      | Note
      deriving (Show, Eq, Ord)

    I don't know if it makes sense to add all these severities here, or to further parametrise the Report (and therefore Diagnostic) type by an additional sev variable.

  2. Similar things (but in the opposite way) for marker types. codespan-reporting only has primary and secondary labels, and we have four (This, Where, Maybe, Blank). Blank is very convenient for including source lines in the report without making other visual differences, which I hope codespan-reporting had in the beginning. We will have to decide what to do with the styling for Where and Maybe. Currently I just assigned the colour for secondary labels to both of them, which is not ideal. The solution would be to pick one more colour to distinguish between them, or (again) to parametrise the Report (and therefore Diagnostic) type by an additional lbl variable.

  3. The code it is around ~370 lines, and unfortunately I have not (yet) add enough comments to make the logic clear (the naming there could probably be improved, too). The logic follows codespan_reporting::term::renderer (the comments there are really helpful). You can find the current version of my implementation in my fork. I can submit a draft PR if you prefer.

  4. I ended up rewriting most of the rendering logic because it was easier to follow the original logic. However, there is one function worth mentioning that I reused (and changed a bit):

    replaceLinesWith :: Doc ann -> (Doc ann -> Doc ann) -> Doc ann -> Doc ann
    replaceLinesWith repl t = go
      where
        go Line = repl
        go Fail = Fail
        go Empty = Empty
        go (Char '\n') = repl
        go (Char c) = Char c
        go (Text _ s)
          = mconcat
          $ intersperse repl
          $ map t
          $ uncurry Text . (T.length &&& id)
          <$> T.split (== '\n') s
        go (FlatAlt f d) = FlatAlt (go f) (go d)
        go (Cat c d) = Cat (go c) (go d)
        go (Nest n d) = Nest n (go d)
        go (Union c d) = Union (go c) (go d)
        go (Column f) = Column (go . f)
        go (Nesting f) = Nesting (go . f)
        go (Annotated ann doc) = Annotated ann (go doc)
        go (WithPageWidth f) = WithPageWidth (go . f)

    This function is critical for supporting multi-line messages. I added an additional parameter t :: Doc ann -> Doc ann, this transformation is applied to every line of the input Doc separately, so that repl does not get affected by the annotation on the input Doc.

  5. I am not 100% sure whether the source range in the report should be a closed range or right half-open. I assumed it to be a closed range, but the rendering looks weird for some test cases.

@Mesabloo
Copy link
Owner

This already looks nice! Although is it me or do the single-line This markers have an extra caret when going until EOL?

  1. In codespan-reporting the crate, there are five different levels of severity. [...] I don't know if it makes sense to add all these severities here, or to further parametrise the Report (and therefore Diagnostic) type by an additional sev variable.
  2. Similar things (but in the opposite way) for marker types. [...]

I don't want to give a definitive answer yet, but:

  • The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else
  • Bug is just a more critical version or Error. Is it really that useful to have it in practice, or can one only rely on Error?
  • If we were to include all those new severities as parts of Report, we'd need colors etc. for every other layout (currently: the ariadne-like, a GCC-like and a typescript-like) + additional ways to format them (perhaps differently?)
    I have not quite found a usecase for both Help and Note though: I usually am satisfied by Where
  1. The code it is around ~370 lines, and unfortunately I have not (yet) add enough comments to make the logic clear (the naming there could probably be improved, too). [...]

I don't think code length is really an issue, unless it leads to very poor performances (which is not that huge of an issue because rendering is usually done only once in the pipeline).
My code for the ariadne-layout is also quite big.
Don't forget that we'll refactor the common functions later on!

  1. I ended up rewriting most of the rendering logic because it was easier to follow the original logic. [...]

No problem here.
It's a shame that we need to have this replaceLinesWith function. Ideally, prettyprinter would allow us to specify “prefixes” to put when breaking lines (instead of just whitespaces).
In the mean time, I quite like how you rewrote it. Definitely better than my original one!

I am not 100% sure whether the source range in the report should be a closed range or right half-open. I assumed it to be a closed range, but the rendering looks weird for some test cases.

I don't think I understood.
In some of the tests cases (which were borrowed from a few of my projects, namely nstar and zilch), the positions are put in weird positions (my original lexer/parser combination used to give positions which go “too far” into the stream). However, those tests cases opened a few bugs in the implementation of the ariadne-layout. This is why I kept them in the test suite.
If this not what you meant, then I definitely did not understand, please pardon me.


Regarding the current output: it looks like it uses half unicode half ASCII. Do you intend it to be that way? Also, will you provide unicode-free alternatives? If not, then I think that deciding whether to provide Unicode+ASCII or just Unicode or just ASCII should be left to the implementor of the layout, hence moved outside of the core library.
What do you think about it?

@ruifengx
Copy link
Author

ruifengx commented Sep 13, 2022

Although is it me or do the single-line This markers have an extra caret when going until EOL?

This is why I asked whether the source ranges should be closed or half-open (that is, should the larger end-point be inclusive or exclusive). I am currently assuming them to be inclusive on both ends, and I coded the rendering in such a way to allow "off-by-one" markers (pointing at technically where '\n' would be).

The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else.

AFAIK, in codespan-reporting the crate, it is possible to produce a diagnostic whose severity is "note" or "help". Such diagnostics are not attached to another "error" or "warning". I don't actually see a concrete use case for such stand-alone "note" or "help" diagnostics, but this is the status quo there, so I thought perhaps they have their own reasoning.

Regarding the current output: it looks like it uses half unicode half ASCII. Do you intend it to be that way?

I should have clarified this in the previous comment. All the screenshots are for the Unicode output (rendering style of codespan-reporting is somewhat simplistic). I picked the Unicode version because it looks prettier than the ASCII version. Anyway, here is a screenshot for the ASCII rendering for your information:
ASCII

For switching between Unicode and ASCII, I followed the approach in the original crate. Instead of passing around a useUnicode boolean flag, I packed all marker characters/strings into a Chars record. In the Layout, I then switch between unicodeChars and asciiChars according to the useUnicode flag.

@ruifengx
Copy link
Author

The Report datatype does not need to change. Only the Note one, which we could repurpose to also include markers if needed (EDIT: this is what you suggested, basically, but I don't think we should allow full reports, only markers). For example,

data Note msg 
  = Note msg [Marker msg]
  | Hint msg [Marker msg]

That way, your layout can be done using a first Hint with no markers, and a second Hint with a single text-less marker (and it isn't lacking sense to do it like this).

Yeah, it looks to me the proper solution. Although there was an oversight in my last comment: note how the diagnostic produced by rustc suggested adding a 'static in the file, and printed the modified line in the help section. To add to the complexity, the added text is 'static␣ (note the space), but the marker only covers 'static (i.e., without the space). An easy way to support this1 is to use different variants of markers in the help message (the exact naming to be determined)234:

data HelpMarker msg
  = Add { _insertionPoint :: SrcPos, _text :: String, message :: msg }
  | Remove { _removalRange :: SrcRange, message :: msg }
  | Annotate { _annotationRange :: SrcRange, message :: msg }

(field names given only for clarity)

Or rather (factor out the common message field):

data HelpMarkerType
  = Add SrcPos String
  | Remove SrcRange
  | Annotate SrcRange
type HelpMarker msg = (HelpMarkerType, msg)

Anyway, I will first try to craft a working implementation, and the design will probably require some more tweaks to achieve the desired flexibility.

Footnotes

  1. For the "space problem" (the insertion of text should include the space but the marker should not cover it), I can think of a hack: we take special care when rendering the Add marker to always avoid covering spaces on the boundary (and perhaps avoid doing so if the added text contained only spaces). Anyway, this could be left for the layout style to decide.

  2. Previously there is only Position which contained a range. Now we might need both SrcPos and SrcRange.

  3. Not sure whether the Remove variant would be useful or not. Add will serve for the purpose illustrated above, and Annotate is just the normal marker. Technically Remove is similar to the regular Annotate in that they both render a marker on existing source. It is only useful to allow different rendering (e.g., green for insertion, red for removal, blue/yellow for annotation).

  4. Also not sure whether we should allow Add/Remove outside the help section. I can think of one use case for it: for layout-sensitive languages like Haskell, when reporting layout-related errors, the diagnostic could include the inserted {, ;, and } to better illustrate the cause of the error. Though I guess such things is relatively rare in practice.

@Mesabloo
Copy link
Owner

Mesabloo commented Sep 24, 2022

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers.
This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

EDIT: ⚠️ we may also want to add full lines instead of just chunks of code.

  1. [...]

I can see how a Remove variant can be useful. Consider this example from my programming language:

public assume {@0 A : type}

In the current state, the error given for this line is:

[error]: Public assumptions in module.
     ╭──▶ /home/mesabloo/projects/zilch/gzc/basics-dev/test/a.zc@1:1-1:27
     │
   1 │ public assume {0 A : type}
     • ┬─────────────────────────
     • ╰╸ Cannot bind assumptions publicly
     •
     │ Hint: Remove the 'public' modifier from this declaration.
─────╯

Perhaps the error message could also include a Remove marker? I don't quite see how to render it though (it's easier to picture adding than removing).

  1. [...]

This is not such a problem is it? We could require a SrcRange for Add (instead of a SrcPos) and insert the text at the beginning of the range, while the marker will cover the whole range. Computations will have to rely on the max between the length of the marker, and the length of the added text (which may rather be a Text than a String).

  1. [...]

I don't think they really have a use outside of help/note sections. They could even mess up error reporting if used outside those sections, as the source code for the main markers will not be reliable (the purpose of putting the source code at the top is for it to be reliable, otherwise it is useless/confusing1).
This unfortunately duplicates some logic (see the beginning of my comment) but it should be okay for now. We'll eliminate code duplication later anyways.

Footnotes

  1. If the source code in the main error message is not the one I wrote, I'd question whether I correctly saved the file or not, or if the tool is just bugged. This is time wasted on a confusing behavior, which we shouldn't allow.

@ruifengx
Copy link
Author

ruifengx commented Sep 24, 2022

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers.
This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

Agreed. I will spend several days to experiment on this and see if it is viable at all.

Perhaps the error message could also include a Remove marker? I don't quite see how to render it though (it's easier to picture adding than removing).

I'm not entirely sure about the ariadne style, but for rustc-like style, I imagine it should be rendered like this (I cannot include colours here, but the line under "public" should ideally use a "removal marker style"):

error: Public assumptions in module.
 --> /home/mesabloo/projects/zilch/gzc/basics-dev/test/a.zc:1:1
  |
1 | public assume {0 A : type}
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^ Cannot bind assumptions publicly
  |
help: Remove this 'public' modifier from the declaration.
  |
1 | public assume {0 A : type}
  | ------
  |

Since + is used by rustc to indicate insertion, perhaps it is also reasonable to use - to mean removal, but visually it looks more like dashed lines instead of line of minus signs. Anyway, the idea is to use different styles for "removal" and "annotation", so it becomes easily distinguishable.

If the source code in the main error message is not the one I wrote, I'd question whether I correctly saved the file or not, or if the tool is just bugged. This is time wasted on a confusing behavior, which we shouldn't allow.

Now that you mention it, yes it will be confusing. I was originally visualizing the rendering in such a way that e.g. the inserted text use fainted colours, but of course we do want the diagnostic not to cause any misunderstanding even if colours are turned off.

We could require a SrcRange for Add (instead of a SrcPos) and insert the text at the beginning of the range, while the marker will cover the whole range.

Yes, in this way the user gets full control over how many characters the marker should span. Though I don't expect "insertion markers" to affect the positions of other following markers, that is to say, I would naturally expect all positions used in the markers should correspond to the positions in the original (unmodified) text.


I know the idea definitely needs polishing and it has to be verified in the implementation. I will come back with the result several days later, and we may have a deeper understanding then to decide on the details.

@Mesabloo
Copy link
Owner

Hello! Do you have any update?
It's been quite a while since last time, so I am just curious. :)

@Mesabloo
Copy link
Owner

Mesabloo commented Oct 14, 2022

From #11 (comment)

I don't want to give a definitive answer yet, but:

  • The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else
  • Bug is just a more critical version or Error. Is it really that useful to have it in practice, or can one only rely on Error?
  • If we were to include all those new severities as parts of Report, we'd need colors etc. for every other layout (currently: the ariadne-like, a GCC-like and a typescript-like) + additional ways to format them (perhaps differently?)
    I have not quite found a usecase for both Help and Note though: I usually am satisfied by Where

I have thought about it a bit and I think that a Critical version may also be of use.
Critical errors must be emphasized compared to simple user errors, because the user as no control over these and must report them upstream (examples of these include the IMPOSSIBLE errors from GHC).
As such, maybe we can in fact also consider supporting a Critical report kind? I have a few ideas how to style them (for example a brighter, bold red) in the current layout.

For the other points, we already discussed them before so there's nothing more to be added here.


From #11 (comment)

Agreed. I will spend several days to experiment on this and see if it is viable at all.

I can also try on my end, see what I can come up with (hopefully something that can be reused/that is generic).
I have taken a look at what you have already done for the codespan-reporting layout and I think your code looks better than mine, so I'll take inspiration from yours.
I may also start working on the documentation if I come up with something good, and also merge a few things in the core library if I really feel the need to.
We'll see at the end what needs to be kept/removed from the core library anyway.

@Mesabloo
Copy link
Owner

Mesabloo commented Oct 15, 2022

From #11 (comment)

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers.
This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

EDIT: ⚠️ we may also want to add full lines instead of just chunks of code.

What I thought would be pretty hard to handle turned out to be way simpler!
Using a very simple GHCi script, I managed to print some results:

  • A green italicized piece of code is added using the special AddCode marker
  • A red bold piece of code is just either a primary, secondary or Annotate marker
  • A dull black piece of code is for a RemoveCode marker

image
This was generated using the following markers:

markers2 =
  [ (Position (2, 38) (2, 47) "test1", Left $ Primary ""),
    (Position (2, 5) (2, 11) "test1", Right $ AddCode "unsafe " "")
  ]

The markers' which are returned are the original markers, where some positions may have been shifted because of AddCode markers. However, note that the original order is preserved.
Positions are shifted so that the logic of figuring out where each marker goes is not duplicated (and is kept within the fetchLine function).
As an example, see the new markers which are generated by the example above (pretty-printed for convenience):

λ> print (fst <$> markers')
[  Position {begin = (2,45), end = (2,54), file = "test1"},
   Position {begin = (2,5), end = (2,11), file = "test1"}
   ]

Only the position of the first marker is affected, because it appears after a AddCode marker.

It also works when multiple AddCode markers are on the same line:
image

This also works when the added piece of text contains newline, though I think this will be hard to handle when showing markers.
We may want to forbid it, but I guess we'll see about that.
image


This code is more generic than it is required to be. Instead of either a list of SimpleMarker msg (markers for the main part of the report) or a list of NoteMarker msg (markers which may appear in notes), a list of Either (SimpleMarker msg) (NoteMarker msg) is taken.
This leads to being able to mix both types of markers, but it should never happen if the function is called correctly (we may also not expose it directly, and only expose variants which take one of the two lists).


Here is the source code (which is rather long, so put into a dropdown section)

Code for `fetchLine`
fetchLine ::
  forall msg ann.
  Pretty msg =>
  -- | The mapping between file names and file contents.
  FileMap ->
  -- | The path to the file to get the line from.
  FilePath ->
  -- | The line number corresponding to the line to get.
  Int ->
  -- | The number of spaces used to show a TAB character.
  Int ->
  -- | The annotation to add to the line when no line is found, in order to colorize the placeholder text (such as @"<no line>"@).
  (ann, msg) ->
  -- | The annotation to add to the line when there are no markers underneath.
  ann ->
  -- | The annotation to add to the line when a marker is found underneath.
  (Either (SimpleMarker msg) (NoteMarker msg) -> ann) ->
  -- | All the markers present on that line.
  --
  --   We want to apply different treatments to simple markers and marker which can appear in notes.
  [(Position, Either (SimpleMarker msg) (NoteMarker msg))] ->
  -- | Returns the width table and the generated annotated 'Doc'ument.
  --
  --   Markers are also returned with the position adjusted to fit the code perfectly (in case
  --   of code modification with 'AddCode' markers).
  (WidthTable, [(Position, Either (SimpleMarker msg) (NoteMarker msg))], Doc ann)
fetchLine files path line tabSize (noLineAnn, noLineText) codeAnn markerAnn markers =
  case safeArrayIndex (line - 1) =<< files HashMap.!? path of
    Nothing -> (mkWidthTable "", [], annotate noLineAnn (pretty noLineText))
    Just code -> mkTuple3 (mkWidthTable code, colorizeCode 1 markers True code)
  where
    colorizeCode :: Integer -> [(Position, Either (SimpleMarker msg) (NoteMarker msg))] -> Bool -> String -> ([(Position, Either (SimpleMarker msg) (NoteMarker msg))], Doc ann)
    colorizeCode _ markers _ "" = (markers, mempty)
    colorizeCode n markers handleAdd (c : code)
      -- if we found that a note marker starts at this position,
      -- then we have to start potentially overriding the code (by adding chunks, for example).
      --
      -- in such case, we stop processing the code at this point, add the chunk of code
      -- and then continue.
      -- we also want to adjust the position
      | (True, Just (AddCode code' _)) <- (handleAdd, specialNoteMarkerAt n markers) =
        let addLength = length code'
            -- register the length of the text, to add to all marker positions
            -- starting after @n@.

            -- WARN: don't partition here! we want to keep the original order of the markers.
            markers' =
              markers <&> \tup@(Position (bl, bc) (el, ec) f, m) ->
                if bc < fromIntegral n
                  then tup
                  else case m of
                    -- if this is the marker we are currently handling, do NOT shift it!
                    Right (AddCode _ _)
                      | bl == line && bc == fromIntegral n -> tup
                    _ ->
                      let newStart = if bl == line then bc + addLength else bc
                          newEnd = if el == line then ec + addLength else ec
                       in (Position (bl, newStart) (el, newEnd) f, m)
         in colorizeCode n markers' False (code' <> (c : code))
      | otherwise =
        let doc = ifTab (fold $ replicate tabSize space) pretty c

            allMarkersAtPos =
              snd <$> flip filter markers \case
                (Position (bl, bc) (el, ec) _, _)
                  | bl == el ->
                    -- for inline markers (those which the ending line is the same as the starting line),
                    -- we can just check if n is included insie @[start, end[@.
                    fromIntegral n >= bc && fromIntegral n < ec
                  | otherwise ->
                    -- for multiline markers, it is a little bit more complicated.
                    (bl == line && fromIntegral n >= bc)
                      || (el == line && fromIntegral n < ec)
                      || (bl < line && el > line)

            colorizedDoc = maybe (annotate codeAnn) (annotate . markerAnn) (head' allMarkersAtPos) doc
         in second (colorizedDoc <>) $ colorizeCode (n + 1) markers True code

    specialNoteMarkerAt :: Integer -> [(Position, Either (SimpleMarker msg) (NoteMarker msg))] -> Maybe (NoteMarker msg)
    specialNoteMarkerAt n = fmap (fromRight undefined . snd) . find \(pos, mark) -> isRight mark && snd (begin pos) == fromIntegral n

    ifTab :: a -> (Char -> a) -> Char -> a
    ifTab a _ '\t' = a
    ifTab _ f c = f c

    mkWidthTable :: String -> WidthTable
    mkWidthTable s = listArray (1, length s) (ifTab tabSize wcwidth <$> s)

    head' :: [a] -> Maybe a
    head' [] = Nothing
    head' (x : _) = Just x
    {-# INLINE head' #-}

    mkTuple3 :: (a, (b, c)) -> (a, b, c)
    mkTuple3 (x, (y, z)) = (x, y, z)
    {-# INLINE mkTuple3 #-}

If there's something which isn't quite clear, feel free to comment on it!

@ruifengx
Copy link
Author

Thanks for the update!

I was mainly working on code clean up, because the rendering logic was previously written as two long functions, and I now managed to break one of them (the entry, for rendering Report) into shorten functions. I also added some documentations to provide an outline for the rendering process. Hopefully this later makes the merge easier, and also makes maintenance less a pain.

Alongside code restructuring and documentations, I also implemented a basic "grouping" logic, so the following problem is now solved:

Currently the placement of multi-line markers is not ideal:

Current Improvement (not yet)
Current Improved

I have a plan of improvement and will rewrite this part soon.

I am also currently investigating on the new marker types for attached notes and helps. This is another reason why I was refactoring the code. I believe the rendering could reuse a substantial amount of code (I mean the part currently responsible for rendering the sub-reports for each file, namely sourceLine in my fork). (I kinda think that your fetchLine serves basically the same purpose as my sourceLine, so I will very likely benefit from the code logic you posted here.)

@Mesabloo
Copy link
Owner

About attached notes and helps, I have changed the Report and Note types so that we can have line markers in notes.
I have not added the cases for standalone notes and helps (yet?), although if it is clearly needed I'll add them.

For now, the code looks like this (in Error.Diagnose.Report.Internal):

-- | A 'HashMap' associating a 'FilePath' to an array of lines indexed by line numbers.
type FileMap = HashMap FilePath (Array Int String)

-- | The type of diagnostic reports with abstract message type.
data Report msg
  = Report
      Severity
      -- ^ The severity of the report.
      (Maybe msg)
      -- ^ An optional error code to print with the report.
      msg
      -- ^ The message associated with the report.
      [(Position, SimpleMarker msg)]
      -- ^ A map associating positions to markers to show under the source code.
      [Note msg]
      -- ^ A list of notes and hints to show various help information after the error.

-- | The severity of a report describes how much it is important to take into account.
data Severity
  = -- | A warning report, which is important but not necessary to fix.
    Warning
  | -- | An error report, which stops the whole pipeline.
    Error
  | -- | A critical report, indicating that something internally failed.
    Critical

-- | The type of markers which are meant to be shown under a line of code.
data SimpleMarker msg
  = -- | A primary marker, which conveys the most important information (such as the location of an error).
    Primary msg
  | -- | A secondary marker, which is meant to add extra information to the report.
    Secondary msg
  | -- | A blank marker which is invisible but allows to include specific lines in the final report.
    Blank

#ifdef USE_AESON
instance ToJSON Severity where
  toJSON Warning = toJSON ("warning" :: String)
  toJSON Error = toJSON ("error" :: String)
  toJSON Critical = toJSON ("critical" :: String)

instance ToJSON msg => ToJSON (Report msg) where
  toJSON (Report sev code msg markers hints) =
    object [ "kind" .= sev
           , "code" .= code
           , "message" .= msg
           , "markers" .= fmap showMarker markers
           , "notes" .= hints
           ]
    where
      showMarker (pos, marker) =
        object $ [ "position" .= pos ]
          <> case marker of
               Primary m -> [ "message" .= m, "kind" .= ("primary" :: String) ]
               Secondary m -> [ "message" .= m, "kind" .= ("secondary" :: String) ]
               Blank -> [ "kind" .= ("blank" :: String) ]
#endif

-- | A note is a piece of information that is found at the end of a report.
data Note msg
  = -- | A note, which is meant to give valuable information related to the encountered error.
    Note msg [(Position, NoteMarker msg)]
  | -- | A hint, to propose potential fixes or help towards fixing the issue.
    Hint msg [(Position, NoteMarker msg)]

data NoteMarker msg
  = -- | We need a piece of code to be added inside the source code.
    AddCode
      String
      -- ^ The piece of code to insert. This must NOT contain newlines.
      --   This may also be longer or shorter than the span of the marker.
      msg
      -- ^ The possibly empty message attached to the marker.
  | -- | Code is requested to be deleted because it causes an error.
    RemoveCode
      msg
      -- ^ The message attached to the marker.
  | -- | A simple annotation under source code to add extra information (e.g. in notes)
    Annotate
      msg
      -- ^ The message attached to this marker.

#ifdef USE_AESON
instance ToJSON msg => ToJSON (Note msg) where
  toJSON = \case
    Note m marks -> object [ "kind" .= ("note" :: String), "message" .= m, "markers" .= fmap showMarker marks ]
    Hint m marks -> object [ "kind" .= ("hint" :: String), "message" .= m, "markers" .= fmap showMarker marks ]
    where
      showMarker (pos, mark) = object $ [ "position" .= pos ]
        <> case mark of
            AddCode code m -> [ "kind" .= ("insert" :: String), "message" .= m, "code" .= code ]
            RemoveCode m -> [ "kind" .= ("delete" :: String), "message" .= m ]
            Annotate m -> [ "kind" .= ("annotate" :: String), "message" .= m ]
#endif

-- | Transforms a warning report into an error report.
warningToError :: Report msg -> Report msg
warningToError (Report Warning code msg markers notes) = Report Error code msg markers notes
warningToError r@(Report _ _ _ _ _) = r

-- | Transforms an error report into a warning report.
errorToWarning :: Report msg -> Report msg
errorToWarning (Report Error code msg markers notes) = Report Warning code msg markers notes
errorToWarning r@(Report _ _ _ _ _) = r

Let me know what you think about it! :)


All the changes are summarized below:

  • I added a Severity data type (so that we are not limited to the current implementation whenever we want to add new ones);
  • The Report constructor now takes a Severity;
  • Marker has been renamed to SimpleMarker (name subject to change, because I don't quite like it);
  • The Note data type now also takes special markers which embody whether we want to add, remove or simply annotate code;
  • Translation to JSON for all those new data types.

I chose not to include full reports in notes following what we discussed earlier.


Alongside code restructuring and documentations, I also implemented a basic "grouping" logic, so the following problem is now solved:

Great! This looks way better.

@Mesabloo
Copy link
Owner

Mesabloo commented Oct 15, 2022

Alright so after a bit of fiddling around, I ended up moving closer to the exact markers you suggested for notes.
Basically, I initially thought that adding messages to note markers would be cool, but I came to the conclusion that this does not fit well. Moreover, I'm starting to wonder whether we should allow multiple markers within notes (generally, one would only put a single marker related to the note itself).

In the meantime, I fixed my GCC-style layout to account for the new changes I made.
Here is an example output:

With Unicode characters With ASCII characters
image image

I believe it looks pretty cool already! I will just get rid of the extra indentation before the code in the note.


EDIT:

Moreover, I'm starting to wonder whether we should allow multiple markers within notes (generally, one would only put a single marker related to the note itself).

I have actually made an example which makes multiple markers in a single note look awkward. So I think I'll just allow a single one for now.
image


EDIT 2:
I also updated the TS-like layout:

With Unicode characters With ASCII characters
image image

@ruifengx
Copy link
Author

ruifengx commented Oct 16, 2022

I copied the code you posted here in diagnose-core, and wrote the logic for rich notes/helps. Here is a screenshot of what it currently looks like (I am pretty satisfied, and I think it is comparable to the output produced by rustc):

Screenshot

The code and the updated test case is available in my fork. I think there are still edge cases that I does not currently handle properly, so there will probably still be some updates, most likely alongside a refactor of sourceLine.

P.S. I find the current interface for Add unintuitive, though. (Maybe I have some misunderstandings?) It has a Position, but (1) the two endpoints must be on the same line, and (2) only the first endpoint is a real location in the source, but the second one is not (only the difference between the two is meaningful and used to determine how long the marker should extend). I still think Add should be something like Add { position :: (Line, Column), text :: String, markerLength :: Int }.

@Mesabloo
Copy link
Owner

Mesabloo commented Oct 16, 2022

P.S. I find the current interface for Add unintuitive, though. (Maybe I have some misunderstandings?) It has a Position, but (1) the two endpoints must be on the same line, and (2) only the first endpoint is a real location in the source, but the second one is not (only the difference between the two is meaningful and used to determine how long the marker should extend).

The current interface for Add only takes a position and the piece of code to add.
However, there are a few things to note:

  1. The position is the position of the marker that will be rendered, which starts on the same codepoint as the code added.
    It is not required to span only a single line (as mentioned earlier) although I don't really know if it should.
    From what I could gather updating the layouts I have made, it should (and the added text should not contain newlines).
  2. The piece of code is unfortunately an arbitrary string, meaning it can contain vertical whitespaces.
    This would break any formatting for markers, unless we explicitely remove newlines in fetchLine.
    EDIT: we could also replace them with the symbol (which is the Line Feed symbol), or really any of these: https://unicode-table.com/en/blocks/control-pictures/

I still think Add should be something like Add { position :: (Line, Column), text :: String, markerLength :: Int }.

This would also work. When the Position given is on the same line, it's basically equivalent to this.
So the question is only: do we want to support/allow multiline code in such case?

In my opinion, it would be better to disallow these, and perform character substitution when adding code (as described in 2. above), but I would like to have your opinion too. (which I believe I can infer as the same thing, given how you defined the Add marker)

(I am pretty satisfied, and I think it is comparable to the output produced by rustc)

I like it! It's almost as if I can't recognize that this is actually Diagnose rendering these. :D

The code and the updated test case is available in my fork. I think there are still edge cases that I does not currently handle properly, so there will probably still be some updates, most likely alongside a refactor of sourceLine.

I'll most probably make a very big commit here in some time, after finishing updating the original ariadne layout to support markers in notes.
We'll soon be able to merge everything in one place I hope!

@Mesabloo
Copy link
Owner

Just a random thought: we now have markers which can signify adding or removing code (where the former actually adds code). Do we also want to add markers to edit the source code (i.e. change a range to some code)?
It's probably a silly idea.

@ruifengx
Copy link
Author

Do we also want to add markers to edit the source code (i.e. change a range to some code)?

I am not against this idea, but given the current types of markers, I would expresses this intent using the Annotate marker (disclaimer: this is not an actual message from rustc):

warning: use of deprecated associated function `std::sync::atomic::AtomicUsize::compare_and_swap`
  --> <source>:17:7
   |
17 |     x.compare_and_swap(
   |       ^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default
help: use `compare_exchange` or `compare_exchange_weak` instead
  --> <source>:17:7
   |
17 |     x.compare_and_swap(
   |       ----------------
   |       compare_exchange
   |       compare_exchange_weak
   |

By the way, would you mind if I introduce transformers as a new dependency? My code could be improved if I could make use of the Writer monad.

@Mesabloo
Copy link
Owner

By the way, would you mind if I introduce transformers as a new dependency? My code could be improved if I could make use of the Writer monad.

As long as it is not part of the core library, then go ahead! The idea will be to release alternative layouts as separate packages, so more dependencies for a layout package isn't a problem.

I am not against this idea, but given the current types of markers, I would expresses this intent using the Annotate marker (disclaimer: this is not an actual message from rustc):
[...]

I see. I wasn't quite sure whether messages were useful on note markers, so I actually removed them (temporarily, I guess) in my own code. But seeing this, I'll make sure to add them back (at least for Annotate, not sure what should be the state for other markers).
Also, as far as I know, GCC does the same as you suggest, so I guess an EditCode would not be quite useful (maybe it could even be weird?).
Anyways, there are still a few things I need to change then!

@Mesabloo
Copy link
Owner

Alright so I have pushed all I had on the custom-layouts branch.
The code lacks a bit of documentation here and there but for the most part it should be fine (I'll add the missing documentation later probably).
I have also added a few test cases just for the sake of it (and also to generate the screenshots I put in the READMEs for each subpackage).
And I made a few changes here and there, mostly related to how markers are represented.
I also took your advice on the transformers and also used it in the layouts I wrote.

Feel free to give any feedback! It will be greatly appreciated.

@Mesabloo
Copy link
Owner

Also, I randomly thought about it the other day, but given that all layouts will be released under different packages, perhaps there is no need to include yours inside this repository?
You could have it as a separate repository of yours and release the package etc.
I mean, it entirely depends on what you want. :)

@ruifengx ruifengx linked a pull request Oct 20, 2022 that will close this issue
@ruifengx
Copy link
Author

You could have it as a separate repository of yours and release the package etc.

Personally, I prefer merging my code into your repo so that it is easier to find and less likely to go out of sync. But certainly I am happy to maintain this part of code in case future changes make maintenance a non-trivial task.

I have just merged your branch into my fork (well technically it was a rebase), and submitted #14. The current version does not support multiline AddCode markers (those markers are handled by replacing control characters with spaces). This is not ideal, but multiline code insertion is really hard to integrate into my current rendering logic. I will checkout your implementation to get some inspiration.


Comments for your implementation: in my understanding, the following code probably does not work as intended, because this is not an orphan instance, so even if the users write their own implementation, there is no way that instance resolution could prefer the user-defined ones (it won't be more specific than the version here). The only possibility I know is IncoherentInstances, which is very dangerous and does not deterministically resolve the instance to the desired one.

instance {-# OVERLAPPABLE #-} IsAnnotation AriadneAnnotation where

@Mesabloo
Copy link
Owner

The current version does not support multiline AddCode markers (those markers are handled by replacing control characters with spaces). This is not ideal, but multiline code insertion is really hard to integrate into my current rendering logic. I will checkout your implementation to get some inspiration.

I ended up removing that part of the logic. Instead, I am replacing any vertical whitespace with a Unicode symbol (see in fetchLine, this is where this is done).
This would probably be better to just replace by an invisible whitespace, but this can still be changed easily if needed.

Comments for your implementation: in my understanding, the following code probably does not work as intended, because this is not an orphan instance, so even if the users write their own implementation, there is no way that instance resolution could prefer the user-defined ones (it won't be more specific than the version here).

I believe that if the user defines a new instance as {-# OVERLAPPING #-}, this will get resolved instead of the already defined one (this is included in the documentation of the module itself). Because my instance is marked as {-# OVERLAPPABLE #-}, any {-# OVERLAPPING #-} instance should be privileged (or at least that's what I think it does).
However, I am not quite sure and this needs some testing.


PS: I'll review your PR a bit later. :)

@ruifengx
Copy link
Author

Because my instance is marked as {-# OVERLAPPABLE #-}, any {-# OVERLAPPING #-} instance should be privileged (or at least that's what I think it does).
However, I am not quite sure and this needs some testing.

If my understanding of the GHC manual is correct, resolving the instance also requires the priviledged instance to be strictly more specific. For this case, my understanding is that the two instances are equally specific. I believe we actually need the default instance in the library to be marked {-# INCOHERENT #-}. Anyway, I agree that testing would be the only way to confirm this.


And regarding the review for the PR, please take your time. Also, I will be happy to address any readability/efficiency issues or code duplication.

@Mesabloo
Copy link
Owner

Mesabloo commented Oct 20, 2022

It turns out that you are definitely right, and that my understanding of overlapping instances was just off.
I also tested with {-# INCOHERENT #-} but GHC also reported duplicated instances (see the MWE underneath).
I guess I'll just have to separate the annotation types from the instance (so that if you don't want the default instance, you don't import the module it is contained in).

EDIT: and indeed, from the GHC manual you linked, the two conditions must hold:

  • Eliminate any candidate IX for which there is another candidate IY such that both of the following hold:
    • IY is strictly more specific than IX. That is, IY is a substitution instance of IX but not vice versa.
    • Either IX is overlappable, or IY is overlapping. (This “either/or” design, rather than a “both/and” design, allow a client to deliberately override an instance from a library, without requiring a change to the library.)

MWE:

  • A.hs:
    module A where
    
    class Something a where
      x :: a
    
    instance {-# INCOHERENT #-} Something Int where
      -- {-# OVERLAPPABLE #-} doesn't work either
      x = 2
  • Main.hs:
    module Main where
    
    import A
    
    instance {-# OVERLAPPING #-} Something Int where
      x = 5
    
    main :: IO ()
    main = print (x :: Int)

And the error:

A.hs:6:29: error:
    Duplicate instance declarations:
      instance [incoherent] [safe] Something Int -- Defined at A.hs:6:29
      instance [overlapping] Something Int -- Defined at Main.hs:5:30
  |
6 | instance {-# INCOHERENT #-} Something Int where
  |                             ^^^^^^^^^^^^^

@Mesabloo Mesabloo linked a pull request Nov 5, 2022 that will close this issue
@spacekitteh
Copy link
Contributor

So I've noticed that you're hard-coding a layout style as something which can be converted to an AnsiStyle. Why is that? Would it not make more sense to have it as a generic DiagnosticStyle which has actual diagnostic annotation information, rather than colour information? Not only would this mean that layouts could be user customisable, it also means that layouts could share more of a common vocabulary between them.

Further, prettyprinter says that annotations should be semantic in nature for as long as possible; it's still entirely plausible to separate layouts from colours, with the mapping of diagnostic components to colours delayed until the user desires it.

@spacekitteh
Copy link
Contributor

For example, rather than something analogous to FileColor, layout algorithms would annotate it with, say, File <filename>; then renderers could choose both the colouring, AND whether to annotate it with a hyperlink.

@Mesabloo
Copy link
Owner

I'm not quite sure that I quite follow the suggestion here.
Unfortunately, I don't think that it is possible to have a single annotation type for all renderers, given that each has their own quirks (e.g. Ariadne uses a rule color, whereas the GCC layout does not). We already discussed this with @ruifengx earlier in this issue, but new suggestions are obviously welcome!
Do you mind expanding a bit on it?

@spacekitteh
Copy link
Contributor

Sure, I'll elaborate in the morning!

@spacekitteh
Copy link
Contributor

In the current approach (both 2.4.0 and the branch), there are two separate operations that are combined into one:

  1. Layout. This refers to taking a Diagnostic and producing a Doc which has a two-dimensional structure; with faulty code, contextual code, hints, error codes, rules, etc. This document should contain only semantic information: annotations which say "this text corresponds to this particular filename", "this text corresponds to a line of code", etc. The user may want to use that semantic annotation before it's modified further, such as to add hyperlinks on error codes linking to a help page for that specific error, etc.
  2. Styling. This refers to setting text properties: Bold, underline, different colours, etc. This should be applied at the latest time possible, and may not correspond to a terminal; it may be for rendering to HTML, for example.

These two things are both conceptually and semantically distinct; a given layout can have many styles, and a given style can apply to many layouts. As an example, say a user really likes the Ariadne layout, but hates the colours; then they could apply a different style to the Ariadne layout.

So, the idea is two have two passes: layout, then styling. The user should get to choose when to apply styling; and they should get to choose what style to apply to a given layout.

data DiagnosticAnnotation = File FilePath 
                          | SourceLocation Position 
                          | Severity SeverityLevel 
                          | DiagnosticCode 
                          | ...

-- | A prismatic class for diagnostic annotations, as they may be embedded
--  into a larger annotation type in a larger document.
class HasDiagnosticAnnotation diagAnn where
  injectDiagnosticAnnotation :: DiagnosticAnnotation -> diagAnn
  extractDiagnosticAnnotation :: diagAnn -> Maybe DiagnosticAnnotation 

-- | Layout the diagnostic in a given format (ariadne, gcc, codespan-reporting, etc)
layoutDiagnostic :: HasDiagnosticAnnotation diagAnn => Diagnostic -> Doc diagAnn

-- | User may wish to colourise the diagnostics in a larger document that 
-- still has other semantic annotations.
class HasAnsiStyling ann where
  injectStylingAnnotation :: AnsiStyle -> ann

-- | Apply bold/underlining/italic/colour/etc
renderAsANSI :: (HasDiagnosticAnnotation diagAnn, HasAnsiStyling stylingAnn) => 
                Doc  diagAnn -> 
                Doc stylingAnn

@Mesabloo
Copy link
Owner

These two things are both conceptually and semantically distinct; a given layout can have many styles, and a given style can apply to many layouts.

Reusing styles across multiple layouts is something that I would have initially liked to have.
Unfortunately, your suggestion comes with a DiagnosticAnnotation which is the base for all layouts. As it is meant to semantically indicate what is which part of the layout, there are a few gotchas that I can think about:

  • with a single annotation type, semantics are not 100% correct for each layout. Consider the two Ariadne and GCC layouts: one uses rules, and the other does not, yet we need to handle rules in both cases, which is not correct for the GCC layout (and most probably other user-created layouts too);
  • separating layout and style annotations means that some parts of the library become more complicated, e.g. fetching source code (which we currently directly annotate with colors, if they are wanted for a specific layout). With this suggestion, we'd need to separate fetching the source code (and transforming TAB characters) and only later on -- when doing the rendering I guess -- try to restyle it (which by the way needs markers information);
  • I don't quite follow the code you wrote there, but I guess it's because I'm not familiar at all with prisms and such. Perhaps this could be simplified to a given extent?
  • If we want everything to be user-customisable, I think it may be wise to drop some type-classes (e.g. the current one in charge of transforming style annotations to ANSI annotations) because the user will then be unable to instantiate them. I don't really know to what extent this applies to this little piece of code though.

I'd be happy to have restyling and reusing styles for multiple different layouts, but sadly I don't know if this can be done while remaining semantically correct (and avoiding dropping cases with undefined). Let me know what you think about some points. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants