Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rehaul the Clerk build system #512

Merged
merged 32 commits into from
Oct 5, 2023
Merged

Rehaul the Clerk build system #512

merged 32 commits into from
Oct 5, 2023

Conversation

AltGr
Copy link
Contributor

@AltGr AltGr commented Sep 15, 2023

The architecture of Clerk has been extended and improved, and now includes
support for modules and dependencies (including building the modules to OCaml
objects as required for either interpretation or linking).

A few notes:

  • all artifacts are now written to _build/

  • the architecture should make it easy to add rules for python, etc.
    Each catala file has a bunch of rules declared, that basically call upon the
    different Catala backends. In addition:

    • if a module name is defined, the corresponding targets
    • phony rules called name@path/to/file for tests, etc.
  • the relevant code from the catala module plugin has been integrated into
    Clerk so the plugin can probably be recycled

  • output tests (not inline) are supposed to be supported but I haven't yet re-added
    one to verify that they actually work

  • to keep things simple, at the moment, Clerk will scan all files (including
    contents) at every startup, generate a big Ninja file with all of their rules
    in it (3000 lines for the catala repo which has lots of tests).
    Then ninja is run on that file.
    Clearly this is sub-optimal ; but I couldn't yet notice any delay. This should be
    optimised at some point but that can be left to future PRs.
    The easiest optimisation is to re-run ninja directly without rebuilding the
    file, assuming that the dependency graph hasn't changed, which should cover
    most incremental builds.
    It would probably be wiser to wait to see where the bottlenecks are when they
    happen before trying to be too clever.

  • the code uses OCaml Seq.t throughout rather than lists. The original idea
    was to be able to pipeline the generation of the Ninja file as the files were
    being scanned, unfortunately at the moment we are required to scan for all
    module definitions to begin with, so the purpose is kind of defeated.
    It may still be useful for future optimisations though, and there was little
    value to rewriting to use lists so I kept it this way.

  • having modules depending on other modules is expected to work, but that
    needs more tests.

  • error messages will also need to be improved over time. We might need
    redundant checks in Clerk to avoid less clear messages from Ninja :/

@AltGr
Copy link
Contributor Author

AltGr commented Sep 18, 2023

Also clerk runtest should be updated to use the new lexer instead of its old regexp-based parsing.

@denismerigoux
Copy link
Contributor

I'll try and find the time to review this :)

@denismerigoux
Copy link
Contributor

You're using biniou now ? The CI build fails because it's missing.

Copy link
Contributor

@denismerigoux denismerigoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AltGr, this is getting really good!

build_system/clerk_driver.ml Outdated Show resolved Hide resolved
build_system/clerk_driver.ml Outdated Show resolved Hide resolved
build_system/clerk_driver.ml Outdated Show resolved Hide resolved
build_system/clerk_driver.ml Outdated Show resolved Hide resolved
build_system/clerk_driver.ml Outdated Show resolved Hide resolved
compiler/surface/lexer_common.mli Show resolved Hide resolved
compiler/surface/lexer_common.mli Show resolved Hide resolved
@@ -111,6 +111,9 @@
(* Directives *)

#define MR_LAW_INCLUDE "Include"
#define MR_MODULE_DEF "Module"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these to all the syntax highlighters ? I know it's a pain but it's useful...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the tree-sitter syntax already 🙃

Joke aside, I think we'll want to add a specific declaration for externally-implemented modules (maybe just > Module Foo external ?) so might be a good idea to settle on this and do all of them in one go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I had a look but I can't figure if there is any support for directives at the moment... I didn't find any reference to the existing Include for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> module Foo external is a good idea!

@@ -136,6 +136,9 @@
(* Directives *)

#define MR_LAW_INCLUDE "Include"
#define MR_MODULE_DEF "Module"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arcz we're adding modules to Catala, do you want to chime in with correct Polish terms for these?

tests/test_modules/good/mod_use.catala_en Show resolved Hide resolved
build_system/clerk_driver.ml Outdated Show resolved Hide resolved
@denismerigoux denismerigoux added ✨ enhancement New feature or request 🔧 compiler Issue concerns the compiler labels Sep 21, 2023
The base is there but not fully used yet in Clerk
(avoids multiple, possibly inconsistent, definitions ; also syncs up Clerk
versions with Catala)
With modules, it's not reasonable to allow two catala files with the same
basename but different langs
2.0 doesn't need biniou or easy_format anymore
I.e. our legacy tests with a separate output tests, to distinguish from inline
tests.
and add some sanity-checks for consistency of used modules w.r.t. actually
loaded modules.
that allows to build arbitrary targets passed through to Ninja
Rather than require all files to be listed on the command-line (and having to
check consistency with `> Using` directives), the main catala CLI is now a bit
more clever.

⇒ There is a new assumption that a module name definition must match the file
name (up to case and extension) — with appropriate error handling to enforce it.

In exchange, `> Using` directives are now used to more transparently lookup the
appropriate `.catala_*` interfaces and the compiled artifacts for the used modules (handling transitive dependencies), with just standard `-I` flags for when they need to be looked up in different places.
We need a concrete intermediate target for e.g. transitive uses of `> Include`
for Ninja to correctly handle them.

Of course we could also unroll all transitive dependencies, but meh.

Note also that now tests now just generate the outputs but facilities for
diffing and resetting are temporarily absent.
* Obsolete code for included tests has been removed

* The engine uses a proper lexer and is much simplified

* An inline test in the middle of the file now only "sees" the file up to that
  point. This fixes an issue where we had spurious errors when a type error was
  added at the end of a file, and it would pop up in tests before it. This makes
  files including many tests much more practical.

* diffing and resetting the tests has been reintroduced (done at the moment in
  Ninja, but for more control (count number of failed tests, etc.) we could put it
  back into Clerk at some point

* The Catala CLI can now take an input from stdin (with the possibility to link
  a (possibly fake) on-disk file for error reporting and file locations ; this
  is useful for running tests)
this is more consistent, avoids empty stamp files and should make things simpler
overall.

The slightly tricky `-C` option is added to Catala so that it can be run from
_build while paths to source and destination files are still specified relative
to CWD (Ninja doesn't provide string manipulation, so otherwise we would have to
explicit both the `_build/dir/` and `dir/` versions of each path).
@AltGr
Copy link
Contributor Author

AltGr commented Sep 27, 2023

Huge update (and rebase) of this PR, @denismerigoux you might want to have another look.

Clerk always generates parent dirs so using prefixes resulted in empty
`test@foo/bar` directories getting created
wow this remained unspotted for some time...
"Usage de" being two words was tripping it
Copy link
Contributor

@denismerigoux denismerigoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's beeter than the last time :) You can merge for me!

@@ -75,7 +75,7 @@ let name : type a. a t -> string = function
| Div_int_int -> "o_div_int_int"
| Div_rat_rat -> "o_div_rat_rat"
| Div_mon_mon -> "o_div_mon_mon"
| Div_mon_rat -> "o_div_mon_mon"
| Div_mon_rat -> "o_div_mon_rat"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well caught!!!

@AltGr AltGr merged commit 9d12e15 into CatalaLang:master Oct 5, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants