-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several issues in ACME certificates #84633
Comments
Shouldn’t the solution be:
I don’t see how any other configuration could work, if ACME runs as regular user you would need to give world-writable rights to .well-known/challenge (in case two jobs with different users would run simultaneously, which happens for instance when you define two new certificates during deployment). In any case the configuration in tempfiles cannot work if you have more than one user : it will try to create .well-known/challenge as both user, but only one of them will ever win... Another possibility would be to use groups and force every "user" using certificates to belong to this group. (EDIT: this would not allow to assign a different group to the resulting certificate, not good) About #48845, the way I see it is to have a acme-foo-setrights service whose only job would be to set rights on the final destination (and would have Wants/After acme-foo + WantedBy multi-user.target). This one would duplicate the last part of acme-foo’s job, but as it’s idempotent we don’t care much, and it could have the RemainAfterExit which would become harmless too in this situation (it would only be useful during deployment, which would anyway restart it regardless of this flag) What do you think? I may do the work to implement that if you agree with the ideas |
The lego client should definitely not be run as root, that's a huge attack surface that has no need for elevated privileges beyond reading and writing out certificates. Currently if you're using the nginx module with HTTP-01 challenges the certificates are owned as |
Not sure if we can assume ACL support, but if we do, we might also be able to just use |
I think we should probably revert #72056, and see if we can fix it differently. |
I just uncovered a new bug in Lego which might be blocking, but it might be a bug for upstream rather: when we change the domains list, it doesn’t regenerate the certificate... (It might be due to the fact that the script runs |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/go-no-go-meeting-nixos-20-03-markhor/6495/25 |
(moved from #85152) @m1cr0man I think a rewrite of the module would be great, as long as compatibility with the basic configuration options was kept; there's a fair bit held over from simp_le that no longer makes sense, and a lot of nits with the current implementation. I know @yegortimoshenko was working on a refactor at some point; maybe they'd be able to share WIP code? For what it's worth, one possible way to fix the various renewal bugs in one shot would be to derive the certificate subdirectory from a hash of its configuration like Of course, in an ideal world, lego would be able to handle forcing renewals on certificate configuration changes itself, but I'm not sure it tracks enough state currently to be able to do this. |
I just opened #85223 |
I'm not sure if we're doing ourselves a favour by having this meta issue - maybe the individual issues should be put into separate issues… |
How about a project board? |
@flokli : merged them all because of inte-dependencies between them, feel free to cut it in pieces :) |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
I agree we should revert #72056 now that I come to think of it doesn't it actually completely break renewal? not only activation?! as the Edit: Reverting PRs: |
Could other people jump in to fix the last two issues? @immae perhaps? I don't have enough time to work on all of them and I don't want to be blocking the 20.03 release |
Sure |
The third checkbox is the easy one: |
By writing the PR I found one more issue that could be problematic...
In the simp_le age, the key.pem file was reused at each run. It’s not the case in lego, unless we use the I’m not concerned with this change but I know this can have a big consequence for people who use key pinning (I’m not sure of the name, I’m thinking of the feature where the web server says to the browser "My public key is Foo, if I give you any certificate with a different public key in the next year be worried"). Maybe we should add something about it in the documentation? (note that since there is no migration from switching from simp_le to lego people who use that will already have issues about this feature) |
Web browsers do not support this feature, but it's often used in native apps like banking apps indeed. I think a note for now should be sufficient, but exposing this |
Note: I don’t think the remaining issues should be considered blocking for the release of 20.03, provided we have proper documentations that acme certificates is not an ideal world: If I understand correctly, the remaining issues are either just annoying or wontfix (regarding migration scripts from old setup) |
Should I propose a PR with --reuse-key @arianvp ? EDIT: |
I had a look at the last checkbox (changing the domains list, or as @emilazy said it can concern other options too), lego officially doesn’t support it (PR welcome though: go-acme/lego#507 the issue is slightly different but raises the same concern). I guess a way to solve it simply would be to store the whole command line in /var/lib/acme/my_cert/command_line, compare the current one with the previous one and switch "renew" to "run" in case it changed. However, the flags like reuse-key cannot be used in "run" so we’re back with the issue of key pinning... |
See my previous comment for a simple way to address renewing certificates on changed settings without any complicated hacks or manual comparisons. |
Ah I just reread it and now I understand it better, sorry. Ok I’ll try to implement that. |
@emilazy : I implemented that, but I’m not satisfied, because it breaks the reusePrivateKey option. I had a look at the code from lego, and the only way I found to honor this option is to have a
The reason being that If I use a hashed key like you suggested, then it will always be empty at the first run. I can trick and copy the files from the old location, but I feel a scenario like above where I compute the hash to determine the Does that sound good to you? |
I opened #85534 which is supposedly relevant. |
@immae could you update your bullet list with what is fixed please? |
Done @arianvp . I don’t know how to handle the last one yet, I made a sketch of a proposition above but got no answer so I don’t know what to do |
(and the reuse key option wasn’t merged finally, maybe I should reopen an issue for that one before it gets lost?) |
@immae There's also another solution for the last issue, though maybe a bit hacky, which would be to use I think that last issue is a serious one, as it breaks user expectations : I don't remember this happening with
|
@pstch : the standard way of lego to say "infinite" is to put 0 for Except that in my implementation I store a hash of the relevant arguments in the .lego folder, and whenever the hash changes it means that the certificate needs to be regenerated. I’m happy with that implementation as far as I am concerned, but it’s just implementation details: the "big" part in the issue is deciding when you need to force renew and when you don’t. |
I wrote a (big) change to acme handling. Its current state is here in my repository: https://git.immae.eu/?p=perso/Immae/Config/Nix.git;a=blob;f=modules/private/certificates.nix;hb=HEAD#l76 It should solve most issues reported here and there (note that contrary to my previous comment the advantages:
issues:
I may create PR for some of the issues if you’re happy with that change. |
For people tracking this, I've been taking a look at what @immae has done, what emilazy has shared, and my own work to rewrite the module. It's going well, just slowed down during the week by work. I will open a PR this weekend. I've resolved most of the issues immae highlighted above except for the backwards compatibility, although adding a migration script would be possible. There's a number of design decisions I've made that will be open for discussion on the PR, and some other issues I have thought of in the process of rewriting it. |
How are people finding the new module? 🙂 I addressed everything that was mentioned in this issue, and since the merge there's only been a small bit of feedback all of which was external to the module itself (either lego or LE related). I'll give it a month and then close this ticket. |
I'd say no news / new bug reports are good news :-) Thanks a lot for all the work that has been done - I think we're in a wayy better situation than before ❤️ |
Thanks a lot for your work @m1cr0man ! I've been using the new module for three weeks now, mostly without any problem. I think the changes are really good. However, I've find a (minor) problem : switching to wildcard certificates fails because the old domain names are still requested, and the request fails because although these names are already validated, LE refuses certificate certificate requests with both a wildcard and subdomains.
Using |
I believe this issue is solved now as the domainHash is used to check whether the certs generated in the last run match the set of configured domain + extraDomains in the configuration. So long as you strip your configuration to just include the wildcard domain it will do a clean cert request instead of a renewal and succeed. I'm going to close this issue now. It has been open long enough and I feel everything it raised has been addressed at this point. If something is still missing/broken feel free to pull me up on it. |
Describe the bug
This issue aggregates one "old" and several new issues with ACME in 20.03 and unstable. Since some of them are dependent on each other to be of concern, I write them all in the same issue.
man systemd.timer
for confirmation about this point, or check systemd logs after one day to confirm that the service was indeed never restarted)Due to the previous issue, the bug is currently not apparent when the timer triggers (every day), only when the service derivation is changed.
lego renew || lego run
where "run" would be the one to run when the list changes)Poke: @aanderse @arianvp @Mic92 @m1cr0man @emilazy @flokli since you were concerned with the previous issue about acme.
The text was updated successfully, but these errors were encountered: