-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fault-tolerant GHC compilation pipeline #63
base: main
Are you sure you want to change the base?
Conversation
I'm all for this in principle. Some comments.
TL;DR: It's not an all-or-nothing thing. GHC already follows the plan outlined here to some extent. The biggeest win would (IMHO) be in the parser. |
With my Haskeller / GHC contributor hat: I'm in support of trying this.
With my HF hat on: I'm not sure about the large scope of the proposal. Instead, I think I'd rather see a much smaller proposal (say, focused just on the parser, perhaps with a chosen cheap approach that doesn't require a big rewrite of |
I'm not sure this is entirely true. I think a general approach of conservatism can work. That is "if you can't get it right, don't say anything". For example, we might say "if in a scope S there is an error in a form that can introduce names, then do not emit out-of-scope errors for names in S". This is still somewhat case-by-case, but I think there are general principles to be had. Obviously there are a variety of heuristics of varying levels of conservatism (I do like some of Richard's suggestions too!). Also, we should look in more detail at what other people do. Aside on the specific example: I would actually hope that a good enough parser could recover in the given program and still realise that there were two constructors called K1 and K2!
This is just the sort of thing I think we'd want to work out as a first step in the design. Even if the passes do recover to some degree, I don't know whether we then allow future stages to continue. Does the typehchecker do the right thing with
Yeah, there's a lot in here. I started writing and I realised that there was in principle a lot to do. However, as @simonpj says, it's quite possible that many of these are partially (or completely!) done already - I am writing from a place of moderate ignorance about the facts of GHC.
The idea with the phases was to exactly ensure that we get something useful as quickly as possible. The proposal could very much be done phase-by-phase. Would it help if we added to each phase "Step N: update a key usecase (e.g. HLS) to benefit from the increased robustness" (which might be a no-op if there are no interface changes)? I do generally agree that if we do any part of this we should do Phase 1 (the parser). However, given the discussion of GHC's existing fault-tolerance it also sounds like later Phases may be less work. So perhaps it would be a shame to stop when there is more low-hanging fruit to pick. One approach would be to specifically ask to fund this in stages: i.e. fund Phase 1, assess from that point how much work it would be to progress, fund the next Phase (or not). |
The lowest hanging fruit I know if is errors hiding warnings when the errors are things that could be warnings. I.e. we have diagnostics that are, as a matter of implementation, "non-fatal". However, we have a policy that they should be fatal, and then we choose not to compute (or display those which we have already computed?) further non-fatal errors/warnings. I think it would be pretty easy to make sure such "make it fatal" policies do not prevent further errors/warnings from appearing. |
(I was gardening the HLS issue tracker and lo, I found an issue that would be solved by this: haskell/haskell-language-server#273) |
Another one! haskell/haskell-language-server#853 This one is quite interesting. If the user has specified But errors from |
Agreed. Some progress in !4711, but I'm a bit sceptical; I spent some time looking around at other tools and am convinced that we need to equip data T x == K1 | K2
f :: T a -> Int
f K1 = 3 lays out to (perhaps missing opening and closing braces, I forgot) data T x == K1 | K2;
f :: T a -> Int;
f K1 = 3; and parsing after the error may continue in What we need is something like menhir's The hacky Of course, this needs someone dedicated to reading related approaches (this is especially important!) and implementing a good one in |
I'm not convinced we need to touch |
I don't think we need to decide how we get better error recovery in this proposal. But this discussion does suggest that it's quite a bit of work, since it sounds like what Perhaps that makes sense as an independent chunk: we might want a different skillset (someone who knows a lot about parsers, rather than someone who is good at wrangling GHC). |
For a), there's possibly relevant info in usethesource/rascal#1909 |
At our tech track meeting today we discussed this briefly and feel like we're not in a position to comment or decide on this yet -- its very promising, but we will need timeline, plan, etc. more worked out as well as more discussion with GHCHQ regarding the right architecture. |
Okay. I don't know if it's feasible for me to do that. In particular, I'm not enough of a parser/GHC expert to actually have a good take on what the right thing to do is or how long it would take. I guess it's not reasonable to say "coming up with a more detailed plan is part of the work". Can we say "the HF would fund this in principle" and then invite a commercial partner to help make it into a more specific plan? |
The discussion has been pretty rich so far, with helpful and supportive comments from ghchq. Personally, I would tend to hope that it could develop further into something resembling more a full plan of attack and estimate. We don't really have a "would fund this in principle" vote as an actionable item, though the overall disposition was pretty favorable. Perhaps at next month's meeting we can have some further discussion and think through what we are able to do to help along this and things that may arise in similar "in between" states. |
In the interest of moving this forward, might I suggest two possibilities:
Do others have thoughts here? Have I wildly misestimated the challenges somehow? |
For the past week, I've been hacking on a patch for
Note the use of Incidentally, that is how I think the I hope that this will work on GHC as well as on this test case: https://gist.github.com/sgraf812/a55f68b8ede8230ff2fa644d090e726c |
@sgraf812 is this something you think you are likely to want to take over the finish line into |
I'm also curious about where on the proof-of-concept <--> working solution spectrum this is. As far as I can tell there are a few paths forward, depending on the answer:
|
My PoC passes happy's testsuite (well, except for an annoying 16 bit issue, see below) but needs a couple of bold changes to
IME, bringing these changes through review always takes more time than one would think.
Perhaps some of the current |
But what is "my approach"? Scrolling back I think you may mean this:
The proposal is quite high level. I think @rae is suggesting one particular approach to making the parser more fault tolerant. (Meanwhile @sgraf812 suggests another.) Have I understood correctly. |
GSoC begins at the end of May, but I believe the haskell.org organizers would appreciate ideas within the next week, as it affects how likely we are to be included by Google. So it would be good to submit an idea soon if you want to go that route @sgraf812 , I think it's fine to tweak it later. |
I'll write down a proposal next week. Meanwhile, I opened https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11990 to report my current progress; bootstrapping GHC with the PoC seems to be working. |
I wrote up a GSoC proposal: haskell-org/summer-of-haskell#182 Feel free to suggest improvements/sharpen goals |
That looks great, thank you for writing that up! |
Huh, while browsing through the excellent references that were suggested in usethesource/rascal#1909, I stumbled over https://arxiv.org/abs/2303.08044, describing a novel GLL backend for |
I like this proposal, though it is indeed missing some details. One that I particularly missed is: how will this interact with source plugins? I think we would need to have some flag to indicate "this is authentic from the source" and "this was a guess by GHC". |
I guess the answer will be "just the same as it interacts with later pipeline stages". If we conjure up nodes through doing error recovery, then downstream stages might need to know about that, and presumably source plugins will get the same thing. It might be helpful to have an example of a source plugin and how you would like it to work on a program with parse errors? |
I don't have a specific example in mind, but one use case might be a plugin that does different recovery, i.e. you have a |
Hi @sgraf812 I went through this issue. https://summer.haskell.org/ideas.html#parse-error-recovery |
Hi Kaustubh, great! I'll continue conversation in haskell/happy#272 and before long in private. |
I'm a potential gsoc contributor I am interested in this project! I am currently taking a compiler course at my university along with a functional programming course in Haskell. My thesis project revolves around extending the Cool Compiler (a subset of Scala, utilized by Stanford) with LLVM as it's backend and utilizing ANTLR. I have just recently worked with yacc in building a lexer and scala bison (a version of bison) in building a parser for the same language aforementioned. Last year, I was a GSOC contributor for the GNU organization working on adding support for the Hurd OS to the Rust Compiler. Since I am a beginner in both Haskell and have had some experience working with parser generators and defining grammars I think this project would be a good starting point for me. I'd be happy to have a short private chat @sgraf812 to further discuss my background and projects I have worked on to see if this good be a good fit? This is one project I attempted using the E-Graph Library (that might be interesting): |
At the TWG meeting we discussed this again and were wondering how the proposal might change with the coming GSOC projects. Overall there's still enthusiasm about the ideas in this proposal, but we aren't fully certain of where it stands in the eyes of the proposers. Of course, part of that is the reality of not knowing how GSOC is going to pan out. Speaking for myself: I'm wondering if this is something we should explicitly put on the back-burner until after GSOC? Right now it seems like there are a lot of open question marks and it might be easier to revisit once some of the proposed 'side-quests' have been explored/worked out. |
Yes, I think we should definitely park this until the GSOC project completes, I think that will shed a lot of light. |
Wanted to chime in here to mention an application for a fault-tolerant parser within Daml. Daml's syntax is essentially Haskell's syntax with a few new varieties of declarations. Currently, we achieve this by forking the compiler and and making modifications to Another possible more flexible approach (and possible stretch goal) would be for Happy to take a plugin that it will call when it encounters a parse error, which can then do its own parsing and hands back control to Happy when it's done. This is very flexible, but I'm not sure how difficult that will be to implement with Happy depending on how it's designed and how difficult it would be to use in practice. Obviously I don't want to change the scope of the project this close to the date - this is mostly my two cents about other potential applications where fault tolerance will be useful. If it turns out enabling the latter plugin approach is easy, that's great too. |
cc @sgraf812 I wonder if extending the brief of the GSoC project from "recoverable parsing" to "resumable parsing" with a user-supplied continuation would be reasonable? I think that would cover @dylant-da 's use case too? |
Yes, that'd cover our use case |
I think my fork in https://github.com/sgraf812/happy/blob/resumptive/tests/monaderror-resume.y that @Kariiem is working on is somewhat compatible with that use case. You would still need to insert an explicit Unless GHC explicitly provides a hook production of the form I'm not sure if we can change the requirements of the GSoC project after the fact though. At any rate, I hope that once we have |
This proposal aims to make GHC progress further in the case of errors. This has two benefits:
This is a HF proposal: it requires a substantial amount of work that will need funding.
I'm keen to get feedback on how much work this would be. I have an intuition in the abstract, but I don't know how difficult it would be in GHC.
rendered proposal