Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying backend selection #414

Closed
tarcieri opened this issue Oct 18, 2022 · 31 comments · Fixed by #531
Closed

Simplifying backend selection #414

tarcieri opened this issue Oct 18, 2022 · 31 comments · Fixed by #531
Labels
do-for-4.0 This should be resolved before we do a 4.0 release

Comments

@tarcieri
Copy link
Contributor

curve25519-dalek currently defines the following backends, modeled as crate features:

  • u32_backend
  • u64_backend
  • fiat_u32_backend
  • fiat_u64_backend
  • simd_backend
  • avx2_backend

First, avx2_backend can be removed: it's a deprecated legacy alias for simd_backend.

Second, I think u32_backend and u64_backend could be selected automatically by gating on e.g. cfg(target_pointer_width = "64"). If this selection were automatic, I think these features could be removed completely, with u32_backend/u64_backend selected and used by default unless another backend has been explicitly enabled.

Likewise, fiat_u64_backend and fiat_u32_backend can be collapsed down to fiat_backend using similar target_pointer_width-based gating.

If all of the above were done, the list above could be simplified down to just fiat_backend and simd_backend.

@rozbb rozbb added the do-for-4.0 This should be resolved before we do a 4.0 release label Oct 18, 2022
@nox
Copy link
Contributor

nox commented Oct 25, 2022

Any chance to make the crate use additive features?

@tarcieri
Copy link
Contributor Author

tarcieri commented Oct 25, 2022

The best way to do that would probably be to get rid of feature-gating for simd_backend, and instead use runtime autodetection to choose when to enable it.

Then the only backend feature left is fiat_backend, which could override the other backends at compile-time.

@ryankurte
Copy link
Contributor

If this selection were automatic, I think these features could be removed completely, with u32_backend/u64_backend selected and used by default unless another backend has been explicitly enabled.

so long as there's still an override / way to disable the detection! sometimes it's useful to be able to run tests for 32-bit platforms under x86_64 with the u32_backend, and where target detection is involved, cpuid tends to break in miri and enclave builds...

Any chance to make the crate use additive features?

i was wondering whether y'all had considered refactoring the backends to traits w/ implementations so you could build the library with any / all enabled, then default based on runtime detection or availability? similar to the way allocators are implemented now. i looked at starting / PRing this briefly but, it would be a lot of work without knowing it'd be useful / desirable.

@tarcieri
Copy link
Contributor Author

tarcieri commented Oct 28, 2022

so long as there's still an override / way to disable the detection! sometimes it's useful to be able to run tests for 32-bit platforms under x86_64 with the u32_backend

FWIW, we used to have explicit features for this in the https://github.com/RustCrypto/elliptic-curves crates, but got rid of the because you can test 32-bit code on 64-bit Linux by compiling for --target i686-unknown-linux-gnu, or using cross.

and where target detection is involved, cpuid tends to break in miri and enclave builds...

We have a lot of experience with runtime feature detection in @RustCrypto using the cpufeatures crate which is on-by-default for many crates, including aes, chacha20, poly1305 and polyval, where chacha20 and poly1305 have explicit AVX2 backends.

cpufeatures already handles SGX and Miri by disabling runtime detection in these environments. SGX can still take advantage of specific features by enabling them as cargo features.

i was wondering whether y'all had considered refactoring the backends to traits w/ implementations so you could build the library with any / all enabled

That seems pretty onerous. Many, many types would need a generic parameter for the backend, and there would need to be different backends for e.g. base and scalar fields.

I think it's a lot simpler to make backend selection as automatic as possible, keeping all the complexity hidden so people don't have to think about it.

@tarcieri
Copy link
Contributor Author

tarcieri commented Nov 5, 2022

Now I remember the simd_backend feature is still needed as it's nightly-only.

I still think it's best for it to employ autodetection, but unfortunately that does still leave two features which are mutually incompatible (simd_backend and fiat_backend)

@nox
Copy link
Contributor

nox commented Nov 6, 2022

Now I remember the simd_backend feature is still needed as it's nightly-only.

Then it should be a --cfg flag you pass through RUSTFLAGS, it should not be a feature.

@tarcieri
Copy link
Contributor Author

tarcieri commented Nov 6, 2022

Yeah, that's what we've started doing in the @RustCrypto crates. We've had a few complaints that it's hard to configure --cfg attributes but it seems better than dealing with clashing feature activations

tarcieri added a commit that referenced this issue Nov 11, 2022
As proposed in #414, this commit changes the backend selection approach.
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled.

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
tarcieri added a commit that referenced this issue Nov 11, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled.

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
@tarcieri
Copy link
Contributor Author

PR with an initial minimal implementation here: #428

tarcieri added a commit that referenced this issue Nov 11, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled).

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
tarcieri added a commit that referenced this issue Nov 11, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled).

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
tarcieri added a commit that referenced this issue Nov 11, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled).

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
tarcieri added a commit that referenced this issue Nov 13, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled).

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
rozbb pushed a commit that referenced this issue Nov 13, 2022
As proposed in #414, this commit changes the backend selection approach,
introspecting `target_pointer_width` to select `u32_backend` vs
`u64_backend` (or `fiat_u32_backend`/`fiat_u64_backend` if the
`fiat_backend` feature is enabled).

This helps eliminate the use of non-additive features, and also the
rather confusing errors that happen if multiple backends are selected
(i.e. thousands of lines of rustc errors).

The selection logic checks if `target_pointer_width = "64"` and uses the
64-bit backend, or falls back to the 32-bit backend otherwise. This
means the crate will always have a valid backend regardless of the
pointer width, although there may be odd edge cases for exotic platforms
which would optimally use the 64-bit backend but have a non-"64" target
pointer width for whatever reason. We can handle those cases as they
come up.
@rozbb
Copy link
Contributor

rozbb commented Nov 14, 2022

Question, now that I've been looking at docs: when documenting private items how should we document all the backends in one place? There's no set of flags that will let us show everything at once. Obviously this doesn't apply to the public-facing docs, which don't include backend info.

@tarcieri
Copy link
Contributor Author

Seems like yet another argument for getting rid of target_feature-based gating.

There's also a --all-targets option that can be passed to rustdoc, which can also be configured via the [package.metadata.docs.rs] metadata section in Cargo.toml.

@tarcieri
Copy link
Contributor Author

tarcieri commented Nov 15, 2022

I took a look at replacing the fiat_backend and simd_backend features with cfg(dalek_backend = "fiat-crypto")-style attributes.

It didn't work out as well as I hoped. Both of them pull in an optional dependency when activated.

While that can be worked around with something like...

[target.'cfg(dalek_backend = "fiat-crypto")'.dependencies]
fiat-crypto = "0.1.6"

simd_backend also needs to activate the nightly feature, and transitive feature activation isn't possible with cfg attributes.

So while we could do this, it would involve either simd_backend throwing a compile error in the event cfg(dalek_backend = "simd") is enabled but the nightly feature is not, or refactoring simd_backend to be decoupled from the nightly feature.

@tarcieri
Copy link
Contributor Author

My vote is to call this issue done for now and postpone switching to a cfg attribute for backend selection until simd_backend has been migrated to std::simd (#415) and std::simd is stable. That would also eliminate the packed_simd_2 dependency.

@tarcieri
Copy link
Contributor Author

tarcieri commented Nov 16, 2022

Given the respective backends actually map to optional crate dependencies, I think features might actually be the best way of modeling them.

However, perhaps we can make things more additive/feature-friendly by choosing a "winner" between simd_backend and fiat_backend so the features actually act additively and one backend is chosen over another when they're both activated.

It's a tough call though because they really do have diametrical opposite properties: one says "I want performance" and the other says "I want formally verified code, even at the cost of performance" (I haven't measured lately but last I checked fiat-crypto 64-bit backend was slower than dalek's non-vectorized u64 backend even).

I guess if we pick one it should probably be fiat-crypto, because if someone's asking for formally verified code it would be bad to hand them something else.

Edit: opened #437 to track support for activating both backend features at once.

@nox
Copy link
Contributor

nox commented Nov 16, 2022

You can make simd_backend bring the features, and then make the cfg flag actually enable bits that use unstable code, right?

@tarcieri
Copy link
Contributor Author

Sounds like you're suggesting something like this? (my comment earlier)

So while we could do this, it would involve either simd_backend throwing a compile error in the event cfg(dalek_backend = "simd") is enabled but the nightly feature is not

I don't think it makes sense to do something that requires combinations of cfg attributes and features. It should be either one or the other, not both.

@nox
Copy link
Contributor

nox commented Nov 16, 2022

I don't think it makes sense to do something that requires combinations of cfg attributes and features. It should be either one or the other, not both.

Features shouldn't be the opt-in knob to make use of unstable features, IMO, even when it feels weird. The only reason to use a feature here is that features are the only way to enable optional dependencies.

@tarcieri
Copy link
Contributor Author

tarcieri commented Dec 9, 2022

#455 switches to --cfg curve25519_dalek_backend for backend selection

@tarcieri
Copy link
Contributor Author

Reopening this as the situation regressed somewhat in #523, which introduced a number of new crate features while removing curve25519_dalek_backend="simd".

Namely, it adds:

  • simd_avx2: enables the AVX2 backend on x86_64 targets; noop on others
  • simd_avx512: enables AVX-512 when the nightly compiler is used on x86_64 targets; noop on others
  • simd: enables the simd_avx2 and simd_avx512 features

So we're back to a mixture of cfg parameters and cargo features, which IMO is less than ideal.

I'm not sure what the purpose of exposing such fine-grained control of the available backends is. simd_avx512 is a noop on non-nightly platforms, so it's fine to always have enabled.

@koute can you explain the rationale/use cases for offering such fine-grained control? The observation of this issue was curve25519-dalek previously had too many knobs which was leading to a lot of incidental complexity, and I feel like we're back to that again. Why would we want to enable the AVX-512 backend, but not the AVX2 backend, for example?

Does it really make sense to have a crate feature associated with each backend, especially when the backends are platform specific? There are plans to add a NEON backend (#457) and potentially a GPU backend (#506) that need to be considered.

Crate features can't be controlled by the toplevel binary, so when using curve25519-dalek through end-user facing crates like ed25519-dalek or x25519-dalek means those features need to be propagated to the downstream crates which adds incidental complexity. Every time a new feature-gated backend gets added it needs to be propagated to the downstream crates (and potentially other crates that wrap those), which adds a lot of complexity, whereas cfg attributes can be controlled in curve25519-dalek directly via the toplevel binary and only the toplevel binary.

My personal preference would be to continue leaning on --cfg curve25519_dalek_backend, perhaps adding something like curve25519_dalek_backend="soft" or curve25519_dalek_backend="portable" to disable runtime autodetection and only use the portable software implementation. If no curve25519_dalek_backend is explicitly specified it would use runtime autodetection.

@tarcieri tarcieri reopened this Jun 12, 2023
@pinkforest
Copy link
Contributor

pinkforest commented Jun 12, 2023

Does it really make sense to have a crate feature associated with each backend

I think we should keep in the course of not having features -given majority would probably not require them - please correct if / when wrong here with this assumption.

Having too many options will cause a lot more complexity with education / documentation / auditing / testing / validating etc. and I think this would benefit being discussed together with all the cfg options including cfg(curve25519_dalek_bits) people seem to forget / not always know about that might change their mind how vastly complex wide matrix it is to test and validate it all e.g.:

To keep it simple a plain "override" for cfg(curve25519_dalek_backend = "backend") would be my recommendation as well:

This would also leave cfg(curve25519_dalek_bits) = 32 | 64 override which was to cater wasm / mobile target/s where it was found that some u64 is faster than u32 and who may need to override this manually - for which we may set good know defaults starting with 5.0.0.

This would keep our education consistent - which we have already spent a decent amount of energy creating - and we can point to a single resource of setting "overrides" without potentially undocumented / easy to miss deviations.

Keeping in mind that most people simply don't need overrides and it should always be controlled at binary level as the consensus here was - or is there a case where we would need education that would outweight the approach here ? Would be interested to hear.

I regularly see crates across dependency trees where some crate in the middle didn't set features correctly and trying to upkeep featuresets consistent - when these are really configuration settings. I think trying to upkeep featuresets like this is a lost cause in general sense in my opinion where people don't have sole control of all the crates in a dependency tree.

We already saw how broken 3.0.0 was by attempting to use the backend featuresets where even in-dalek ecosystem the feature flags were not relayed correctly in some cases or omitted together - main beneficiary would be the use-cases where one controls the whole dependency tree but I don't see a reason why they cannot use cfg() flags if they have a control of the whole tree in any case.

And for which we were going to put some effort for 5.0.0 release to more automatically set these as well known defaults:

From release PoV for 4.0.0 we should keep these plain overrides which will help testing and ensuring the implementation is correct given there is limited bandwidth to get changes for 4.0.0 and leave rest of the trickery for later release/s whether breaking or not - we should be less afraid of breaking major versions so we could potentially finally release 4.0.0 that people can anchor on.

@koute
Copy link
Contributor

koute commented Jun 12, 2023

@koute can you explain the rationale/use cases for offering such fine-grained control? The observation of this issue was curve25519-dalek previously had too many knobs which was leading to a lot of incidental complexity, and I feel like we're back to that again. Why would we want to enable the AVX-512 backend, but not the AVX2 backend, for example?

Well, I think it just makes sense to have those as discreet features that can be enabled or disabled, especially since those aren't supposed to change anything from a functional point of view. As a precedent the regex crate does something similar where they have a perf feature which enables all default performance-related features and a bunch of fine grained perf-* ones which can be used to enable them one-by-one. You of course also have the -C target-feature accepted by rustc which also allows fine grained control over what's used (without the fallback if unavailable at runtime).

I suppose the primary use case would be testing?

You're right though that it could be confusing to the end users as to which backend(s) is/are enabled, and that having a mix of cfg-based and feature-based flags is not ideal. Alas there probably isn't a perfect solution here and anything we go with will have some downsides.

This is purely a personal opinion, but honestly I would probably just do something like this:

  • use only cargo features for backend selection, with all backend selecting features disabled by default
  • if no backend feature is enabled then a default will be picked (so it will essentially implicitly enable the default backend features, but without enabling them in cargo through default = [...])
  • if any backend feature is enabled then whatever was the default implicit backend will be disabled
  • compiling curve25519-dalek with multiple incompatible backends selected will result in a compile error (using compile_error! with a nice descriptive error message telling the user why this happend and what they should do to rectify the situation)
  • libraries pulling in curve25519-dalek should not enable any backend features (explicitly document this!)

So then we would achieve the following:

  • the backend is either the default one, or it's explicitly selected
  • just including curve25519-dalek in the deps will work and will by default pick the "best" backend
  • no need to deal with the awkwardness of cfg-based backend selection
  • those users who want to have strict control over which backend is used can easily do so through cargo features
  • using default-features = false doesn't affect backend selection
  • libraries don't have to explicitly pick a backend (and in fact they shouldn't!) and can just completely ignore backend-related features
  • it will be clear which backend is being used if explicitly configured (because it won't compile otherwise)
  • if the user selects multiple incompatible backends in their binary they'll get a nice error telling them how to fix this
  • if a library selects a backend when they shouldn't have then all of the downstream users won't be able to pick a backend anymore, so they'll most likely complain to the library, and then the library will fix itself

main beneficiary would be the use-cases where one controls the whole dependency tree but I don't see a reason why they cannot use cfg() flags if they have a control of the whole tree in any case.

There are some projects (like the one I work on) which are huge frameworks where we control 98% of the dependency tree, but we don't have the control over the end user binary and how it's built. So we're the ones pulling curve25519-dalek in and we're supposed to configure its flags, and the end binary isn't supposed to know/care. So we physically can't use any cfg-based knobs.

@pinkforest
Copy link
Contributor

I think we need more of the Why here rather than How.

I don't think testing use-case justifies feature flags considering testing should be done at library level here in any case.

It is not the curve25519-dalek downstream library's responsibility to test curve25519_dalek configuration knobs.

If the downstream library from curve25519-dalek really wants to test the configuration knobs it can have a reference binary for that

What kind of testing would be required at mid-layer consumer library level that needs these features that cannot be done here ?

Is the only use-case to provide "white labeling" option for the mid-level library wants hide these overrides as point of abstraction ?

Then we should be asking why does the mid-level library need control for this -

I don't think there is use-case for giving control away to make it worthwhile effort to add further complexity just in case basis.

fwiw - If we look at those esoteric regex perf features - I don't know anyone using these really - not even burnsushi I suspect uses them.

Most people if they do end up knowing them in first place neither change these or never end up relaying the featureflags across the dependency chain.

This literally happened with dalek crates earlier where it was not possible to use backends because these were not relayed correctly and integration testing to test the feature-chaining was hit and miss if it worked to begin with (it didn't 😆)

Especially if these are at library in the middle of a long dependency chain where the whole regex affair gets buried deep down as implementation detail where everyone tends to forget these knobs going to category "never needed".

[ .. ] and the end binary isn't supposed to know/care. So we physically can't use any cfg-based knobs.

True - Most if not all users should not worry about these and I'm beginning to think we should not have any knobs for backend override at all that can be managed via dependency chain.

Let's keep in mind that backend override (not a selection) is not the only knob we have - we have another well defined knob with clear justified use-cases for the cfg(curve25519_dalek_bits = 32 | 64) as well.

If we do end up supporting this then we can do this at build script where the feature flag triggers appropriate cfg() and happy to send a PR for it.

It would be still great idea nonetheless to understand the actual use-case/s why somebody would be using these features so we can document and test them in the basis we do similarly with the bits override.

Only remote thing I could think about as use-case to disable e.g. AVX512 where some single CPU model / batch would have some sort of esoteric bug that leads to overheating or some other unfortunate effect - but I don't think this should be the responsibility of mid-layer library in the middle to mitigate but the one who controls the binary who would be affected by this by using certain CPU model that is affected by some CPU related bug - or curve25519-dalek's which has the CPU detect in first place to address this strongly hypothetical case.

So we're the ones pulling curve25519-dalek in and we're supposed to configure its flags

Why does a library in the middle need control of these configuration flags ? Could we please elaborate use-case/s for this ?

@tarcieri
Copy link
Contributor Author

tarcieri commented Jun 12, 2023

compiling curve25519-dalek with multiple incompatible backends selected will result in a compile error (using compile_error! with a nice descriptive error message telling the user why this happend and what they should do to rectify the situation)

This was exactly where we were before with curve25519-dalek v3.x, and it's was a nightmare, hence opening this issue in the first place. If two downstream dependencies activate multiple backend features at once, your build breaks. Semantically crate features are just a bad solution for modeling a 1-of-N decision.

The documentation for crate features has a lot of bad things to say about this approach as well:

https://doc.rust-lang.org/cargo/reference/features.html#mutually-exclusive-features

"This should be avoided if at all possible. [...] Instead of using mutually exclusive features, consider some other options"

libraries pulling in curve25519-dalek should not enable any backend features (explicitly document this!)
[...]
There are some projects (like the one I work on) which are huge frameworks where we control 98% of the dependency tree, but we don't have the control over the end user binary and how it's built. So we're the ones pulling curve25519-dalek in and we're supposed to configure its flags

Aren't you breaking your own rule there? Are you saying that "libraries" shouldn't provide any curve25519-dalek configuration, but for "frameworks" it's okay, and if so, why?

What configuration do you want your framework to set and why is it overriding the defaults? Can you give a concrete example of where cfg attributes would actually be problematic in your case, but features are not?

This is where it's nice that instead of documenting that as best practice (but still leaving the door open for clashing dependencies to break the build if developers overlook the fine print), cfg parameters have no misconfigurations which can break the build due to selecting mutually incompatible backends.

@koute
Copy link
Contributor

koute commented Jun 12, 2023

Why does a library in the middle need control of these configuration flags ? Could we please elaborate use-case/s for this ?

Aren't you breaking your own rule there? Are you saying that "libraries" shouldn't provide any curve25519-dalek configuration, but for "frameworks" it's okay, and if so, why?

In a strictly technical sense they're both libraries, but normal libraries don't pull in 1000 transitive dependencies (many of those are our own crates, but still) when you include them in a Cargo.toml. You can think of it as a huge binary that just happens to be a library crate so that the user can customize it.

This was exactly where we were before with curve25519-dalek v3.x, and it's was a nightmare, hence opening this issue in the first place.

But that's not exactly what I'm proposing here! (:

What v3.x had was that the backend feature was set as a default feature flag. This means that if I do cargo add curve25519-dalek --no-default-features then my crate won't compile because the backend feature is missing. So intermediate libraries were incentivized to also add a default backend feature when they depended on curve25519-dalek so that they can compile by default.

What I'm proposing here is to not set a backend in the default feature flags, and then when no backend feature flag is selected just to assume that the default backend is enabled. So if, let's say, I'm writing a library, and I do cargo add curve25519-dalek --no-default-features and then try to compile my crate then everything works just fine, and I can just not have any curve25519-dalek passthrough features in my Cargo.toml while still letting the users select the curve25519-dalek backend if they directly add curve25519-dalek as a dependency and add the relevant feature flag themselves.

If the intermediate libraries won't touch the backend selection features in any way then there won't be any conflicts. And if we don't break their compilation when --no-default-features is enabled then they won't have a reason to touch them.

Sure, there could still be libraries that would add passthrough feature-flags to select the backend and there wouldn't be much we could do about it, but that wouldn't be the default, and we could write in the docs "hey, don't do that, and if you do you're on your own!" or something along these lines.

"This should be avoided if at all possible. [...] Instead of using mutually exclusive features, consider some other options"

Indeed. But if there's no alternative then personally I prefer using it to --cfg-based configuration knobs though.

(Although here arguably there could be an alternative - allow backend selection at compile time through generics. It'd require some refactoring and increase compile times for end users though.)

What configuration do you want your framework to set and why is it overriding the defaults? Can you give a concrete example of where cfg attributes would actually be problematic in your case, but features are not?

Currently we wouldn't need to set those, but if we did we really can't use cfg-based configuration knobs. I don't have a concrete number of how many downstream users we have, but it's probably in the hundreds; it'd be very disruptive to tell them that "hey, you need to set this magic RUSTFLAGS environment variable every time you compile because one of the 1000 dependencies we pull in needs it". The downstream users don't know nor care that we use curve25519-dalek, and if need be we really don't want to make things harder on them.

This is because, again, we're not really a normal library. Normal libraries are not composed of 250+ crates and pull in 1000+ dependencies. We're essentially a binary masquerading as a library.

Most if not all users should not worry about these and I'm beginning to think we should not have any knobs for backend override at all that can be managed via dependency chain.

I wholly agree that most if not all users should not care about the backend and not touch it at all, which is why, yes, what we had in 3.x was not great because they were forced to care.

@tarcieri
Copy link
Contributor Author

So to be clear, you want to reserve the right to potentially override curve25519-dalek's defaults in the future, but don't have any pressing needs to do so for now? Seems like YAGNI.

Meanwhile going a 100% features route again would reintroduce a way for intermediate dependencies to break the build for their downstream users. Yes, we can document how to avoid that and hope people read the documentation, or alternatively we can prevent it from happening entirely via a design where the failure conditions are inexpressible.

@koute
Copy link
Contributor

koute commented Jun 12, 2023

So to be clear, you want to reserve the right to potentially override curve25519-dalek's defaults in the future, but don't have any pressing needs to do so for now? Seems like YAGNI.

Not necessarily reserve; you asked me why we wouldn't be able to use a --cfg-based config knob and I just wanted to explain. (:

It's just I personally think that the worse user experience of having to use a --cfg flag is not worth it if we can make it so that in practice intermediate libraries won't break their downstream users' builds because (unlike in 3.x) we can make the backend selection not be in the defaults so they won't have to touch them at all.

So, to summarize, personally I'd just 1) make it implicitly select a default backend if no features are provided, 2) name all of the backend features force_backend_* (so logically it should be then more-or-less obvious to an onlooker that you can't force multiple backends at the same time) and make them non-default, 3) put a comment in ALL CAPS next to them in Cargo.toml documenting that these are exclusive features and should only be used in downstream binaries and that libraries should not set them.

And yes, you're right, that would reintroduce a way for intermediate dependencies to break the build for their downstream users. It is a tradeoff.

Anyway, if I can't convince you then I don't intend to press the matter further. (: You're right that we technically don't need this now, so fair point!

@jrose-signal
Copy link
Contributor

I do wish there was some way to solve the discoverability problem; I have to look individually at the docs for every indirect dependency I have to know if it responds to custom cfg flags or environment variables. Cargo features are only a little better here in that they could be collected mechanically, but I don't think they are. I don't have any strong ideas, though, only things like "eprint a warning in build.rs if the user hasn't explicitly chosen a backend or 'auto'", which I think would get old fast for every intermediate library.

@tarcieri
Copy link
Contributor Author

Cargo features are only a little better here in that they could be collected mechanically, but I don't think they are.

We use doc_auto_cfg now which does collect information about features and presents it on docs.rs, but yeah, it still isn't great. It'd really be nice if crates.io collected that information as well.

That said, I think >99% of the time there will be no need to set a cfg attribute.

There are only three situations presently where you'd hypothetically set one:

  • Override target word size: 32-bit/64-bit (though we should have reasonable defaults now)
  • fiat-crypto backend because you care about formal verification
  • Force portable backend over SIMD w\ autodetection/fallback

Hypothetically we might add "use a GPU backend" but that's the only foreseeable other one.

All of these things seem like concerns for the toplevel binary.

@pinkforest
Copy link
Contributor

PR's up based on consensus to close this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-for-4.0 This should be resolved before we do a 4.0 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants