Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing x-opam-monorepo-* solver config fields via the command line. #307

Merged
merged 22 commits into from
Jun 7, 2022

Conversation

NathanReb
Copy link
Contributor

This PR adds command line arguments to either overwrite or complement x-opam-monorepo-* in local opam files.

Each existing field gets a --<field-name> option. When this option is passed, its value is used, ignoring local opam files fields.
It also gets a --add-<field-name> option to add extra on top of the local opam files fields. It is combined in the same way two fields from different opam files are combined.

It can be useful for experimenting or in certain scenarios where those configuration bits have no natural opam file to be written in.

Implementation

To be able to do this without writing too much repetitive code I introduced Serial_shape which allows to derive both opam and cmdliner parsers and printers from a single value. It simplifies the code of Source_opam_config quite a lot.

This required moving the URL rewriting a bit further away from the parsing code but overall it makes things a bit clearer and easier to test.

You'll note the PR comes with an extensive set of tests.

For now handles opam values and Cmdliner arg converters.
The Cmdliner part is a bit harder to deal with due to the
lack of structure of the cmdliner arguments which are just parsed
from raw strings. We might have to later enforce a more structured
format to work around the ambiguities in the cmdliner parsers.

Signed-off-by: Nathan Rebours <[email protected]>
This is required for `opam-provided` convenience format

Signed-off-by: Nathan Rebours <[email protected]>
Each opam extension now has a pair of corresponding CLI
options that can be used to either complement or overwrite
the values defined in your local opam files.

Signed-off-by: Nathan Rebours <[email protected]>
This was not the case to allow lighter syntax but having a stricter
syntax simplifies things quite a lot.

Signed-off-by: Nathan Rebours <[email protected]>
@NathanReb
Copy link
Contributor Author

I'm opening as draft as I need to write the documentation first, especially since the command line arguments format needs to be properly documented.

CHANGES.md Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@NathanReb NathanReb marked this pull request as ready for review May 25, 2022 14:09
@NathanReb
Copy link
Contributor Author

It should be good now!

Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion on this is two-fold: I think technically it is a nice solution, with some nice abstractions to keep duplication to a minimum. There's some bits that I would hope to improve, e.g. I'm thinking on how to structure the list parser in a more readable way, but I believe this can be improved over time. The large amount of tests is great, especially that as they test the less readable parts of the code, where I have a much lower confidence of having a complete understanding of it (and my attempts to improve the code can be tested on examples directly). On the other hand, this creates a non-trivial amount of abstraction, introducing another kind of converter.

The reason why I am sceptical is because I am not sure this is a good feature. One of the nice things about opam-monorepo in my opinion is precisely the fact that its configuration is (for the most part) in the configuration file (and we plan to move towards this and away from inheriting the switch/global state). As such making changes to the configuration in the CLI seems to me to be the antithesis to that because users now have the monorepo setup not contained in their opam file but also need to remember to pass the same options as were used last time when the lock file was created to create a reproducible solution. So they either have to check their history, ask a colleague how they ran opam-monorepo or have the call to opam-monorepo in some kind of Makefile or documentation. This spreads the configuration over potentially multiple places, which will make it worse from a user perspective.

I think one of the big downsides of opam at the moment is specifically that it is overly configurable and a project has to document what complex set of flags is required to lead to the desired state of the project.

CHANGES.md Outdated Show resolved Hide resolved
doc/lock.mld Outdated Show resolved Hide resolved
doc/lock.mld Outdated Show resolved Hide resolved
doc/lock.mld Outdated Show resolved Hide resolved
doc/lock.mld Outdated Show resolved Hide resolved
test/lib/test_serial_shape.ml Show resolved Hide resolved
test/lib/test_source_opam_config.ml Outdated Show resolved Hide resolved
test/lib/test_source_opam_config.ml Outdated Show resolved Hide resolved
test/lib/test_source_opam_config.ml Show resolved Hide resolved
test/lib/test_source_opam_config.ml Show resolved Hide resolved
NathanReb and others added 2 commits May 30, 2022 10:48
Co-authored-by: Marek Kubica <[email protected]>
@Leonidas-from-XIV
Copy link
Member

I've been pondering how to make the list parsing easier to follow and make sure it works, I ended up with https://github.com/Leonidas-from-XIV/opam-monorepo/tree/config-via-cli-lexer (specifically d92ff9e at time of writing) which uses ocamllex to generate a list of tokens and then a fold to construct the split list. This two-stage approach has the advantage that the joining of characters is a bit more factored out (happening mostly in the lexer) while the verification of the validity in presence of [, ] and , is now in the parser.

After fixing some fairly obvious errors (forgetting to reverse, forgetting to add to buffer) it passed the extensive suite of tests which I would take as a good sign - both of the code and the test suite.

@NathanReb
Copy link
Contributor Author

I took a look at your alternative parser. I find the parser code to be quite similar although slightly harder to read in my opinion. It adds more code and that code spreads over two files.

I think it's mostly a matter of personal preference here.

I don't see any clear benefit switching to from one to the other so unless you have very strong opininons on this, I think we should stick to the original code.

I commented the split_list function and renamed a couple things hoping it would make it a bit clearer.

@NathanReb
Copy link
Contributor Author

The reason why I am sceptical is because I am not sure this is a good feature. One of the nice things about opam-monorepo in my opinion is precisely the fact that its configuration is (for the most part) in the configuration file (and we plan to move towards this and away from inheriting the switch/global state). As such making changes to the configuration in the CLI seems to me to be the antithesis to that because users now have the monorepo setup not contained in their opam file but also need to remember to pass the same options as were used last time when the lock file was created to create a reproducible solution. So they either have to check their history, ask a colleague how they ran opam-monorepo or have the call to opam-monorepo in some kind of Makefile or documentation. This spreads the configuration over potentially multiple places, which will make it worse from a user perspective.

I understand your concerns but I think those CLI option serves multiple purposes, as I described. They are useful for experimentation and debugging and can be used in Makefiles in contexts where there is no obivous opam file where to encode them. Some people do use Makefiles and I think they will appreciate having this ability. Tezos is one of those users.

It's also worth noting that these values are included in the generated lockfile so changing them by mistake will show in review!

Since we're also moving toward offering an update functionality to the tool, I think it's nice that it will read them from the existing lockfile so that as long as you simply want to update things, you're sure to re-use the initial configuration.

@Leonidas-from-XIV
Copy link
Member

It's also worth noting that these values are included in the generated lockfile so changing them by mistake will show in review!

The problem is, that the changed file will make it very hard to determine what CLI invocation produced it, so you'll know your new file is somehow wrong but aren't able to fix this state without asking whoever created it or your past self (or shell history).

While thinking about it, I had a random idea that could be a good compromise: we could write the invocation of opam-monorepo as a comment into the lockfile. From what I see this has several advantages:

  1. It allows people who want to reproduce the lock file to reproduce it. At least provided they use the same version of opam-monorepo but we don't really provide cross-version reproducibility for plenty of good reasons.
  2. It serves as documentation how the lockfile came to be, so newcomers to a project using opam-monorepo are aware of it and due to the invocation written there, they have a pointer to see how to use opam-monorepo

What do you think?

@NathanReb
Copy link
Contributor Author

Sure, that sounds reasonable! Do you mind if we do this in a separate PR as it will likely dump the content of argv and is therefore not strictly related to this?

@Leonidas-from-XIV
Copy link
Member

I find the parser code to be quite similar although slightly harder to read in my opinion.

I think the big advantage for me to read is the distinction because lexing and parsing (although calling what my split_lines does "parsing" is slightly ambitious), but being able to see the rules how the tokens are converted into a data structure without the distraction of managing the input buffer at the same time helps me to understand how this is constructed.

However, I totally agree that the separate file is quite distracting - I had hoped ocamllex would be more helpful in this situation, being able to parse longer sequences but alas, it is very similar to your implementation and also just reads a single character at a time. Do you think it would be more readable for you if I replaced the ocamllex tokenizer with a hand-written one? It might actually end up simpler and within one file.

@NathanReb
Copy link
Contributor Author

Ok yeah I think I'm starting to see what you'd like to simplify there. If I understand correctly, what you like from the lexer based implementation is the tokens abstraction, which is free from String.sub invocations.

The problem I see with this, which to be fair is not that much of a problem, is that you have to chose between the complexity in sub string extraction or the complexity of recomposing those strings correctly from the tokens. What I like about the first is that we don't go through temporary allocations as much, which is why it seemed more straightforward to me.

What would you think about merging this with the current parser and then open a PR with the proposed improved implementation with the hand-rolled tokenizer to see if we can bring it to something better?

I think the current state of this parser is good enough for merging and I'd like it if improving it wouldn't delay the merge of this PR which is otherwise quite large.

Overall I think we could quite significantly improve the cmdliner parsing here but I wanted to go for something that was good enough and worked. Ideally I did not want to impose the use of [ and ] delimiters when not necessary but ended up being limited by the implementation so that's something we could also tackle when improving it!

Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah I think I'm starting to see what you'd like to simplify there. If I understand correctly, what you like from the lexer based implementation is the tokens abstraction, which is free from String.sub invocations.

Exactly. While it goes over the stream once it the tokenizer (be it ocamllex or hand-written) has one responsibility: constructing the stream of relevant tokens out of the input stream. Then the parser uses this token list to construct the parsed data structure or fail, without having to deal with control of the input stream. As you can see in my branch the parser is quite trivial, it is essentially the same logic as your code but without the input stream logic intertwined.

What would you think about merging this with the current parser and then open a PR with the proposed improved implementation with the hand-rolled tokenizer to see if we can bring it to something better?

Yes, that's ok with me, so approving the PR. We deferred the other observation to (an much better) proposal in #308 and besides these two concerns I had the code is good.

@NathanReb NathanReb merged commit a76c324 into tarides:main Jun 7, 2022
NathanReb added a commit to NathanReb/opam-repository that referenced this pull request Jun 9, 2022
CHANGES:

### Added

- Add a `--minimal-update` flag to `lock` to generate a lockfile
  with minimum dependency changes from a previous lockfile. (tarides/opam-monorepo#305,
  @NathanReb)
- Add command line options to complement or overwrite `x-opam-monorepo-*`
  fields. (tarides/opam-monorepo#307, @NathanReb)
- Save the `lock` CLI arguments in `x-opam-monorepo-cli-args` when generating a
  lock file. (tarides/opam-monorepo#309, @NathanReb)
@NathanReb NathanReb deleted the config-via-cli branch June 10, 2022 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants