-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 458: RSTUF Integration #2
Conversation
e032cea
to
e70a3b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments about the infrastructure commit
dev/rstuf-bootstrap-payload.json
Outdated
"bins": 1 | ||
}, | ||
"services": { | ||
"number_of_delegated_bins": 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we choose a smaller number of delegations for easy debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number should fit the expected number of target files. How many target files do we expect in a dev setup? If only a few for illustration, we can definitely start with a smaller number.
(ideally, the number is chosen automatically anyway, see e.g. https://gitlab.com/rugged/rugged/-/issues/137)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could be a feature in RSTUF, and it is ready to implement this feature with the last changes in the database model.
The current number in PEP 458 is 16384. We can choose a smaller number, such as 16 or so.
|
||
|
||
@tasks.task(ignore_result=True, acks_late=True) | ||
def generate_projects_simple_detail(request, projects): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this task
-
I need it to have the
request
object. I couldn't find a way to retrieve therequest
to be used by therender_simple_detail
function. You can see it is used during the process of thewarehouse tuf dev import-all
commandwarehouse/warehouse/cli/tuf.py
Line 136 in 706478a
request = config.task(_generate_projects_simple_detail).get_request() -
I also intended to use it later during the Manage packages/project and simple details to TUF commit. Still, it wasn't possible because I could not send the simple detail generated in the background. I need the response to use in the TUF metadata.
If I can retrieve the request
object or generate one with the render_simple_detail
, we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warehouse folks should be able to help with that.
e70a3b7
to
ec77bae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RSTUF, by design, is asynchronous
targets.*
returns always the task id(s)
Should the Warehouse wait and monitor the task result to make it "synchronous"?
Should the Warehouse create an asynchronous monitoring/action for the tasks to take some action depending on the result?
Currently, we store the task id in the events, but the task id
remain available in RSTUF for 24 hours (as Celery defaults)
Should the Warehouse store the task results?
ec77bae
to
5243442
Compare
""" | ||
|
||
|
||
def _fetch_succinct_roles(config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could use some explanation. It looks like you manually tofu the first version of root in RSTUF and then use the client updater to update timestamp, snapshot and targets, to get the hash bin delegation info from the latest targets, without first updating root?
I have a few question:
- Is it unsafe to expect that top-level targets has not yet been updated when calling import_all? If not, couldn't we just use the bin numbers from
rstuf-bostrap-payload.json
to generateDelegations
object? - If we do need to look at the metadata, does warehouse have no other way, to get the latest metadata, than using the client updater?
- If we do need the client updater, why not use it all the way, via
refresh()
, and also update root? - If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from
rstuf-bootstrap-payload.json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could use some explanation. It looks like you manually tofu the first version of root in RSTUF and then use the client updater to update timestamp, snapshot and targets, to get the hash bin delegation info from the latest targets, without first updating root?
Let me explain the problem. I think you already gave some solutions for it.
The Warehouse user, in that case in specific, a Warehouse developer who wants to develop/test something related to RSTUF / Warehouse, will use this command to import all existent data from development Warehouse database to RSTUF database.
warehouse tuf dev import-all
is a subcommand inside dev
command to make clear it is for development only. (Maybe a better protection to avoid it to run in production?)
It requires Warehouse database initialization (make initdb
) and TUF initialization (make tufinit
) first.
What I'm trying to do here: I need to identify the Delegation role name, to fetch from RSTUF Database (table rstuf_target_roles
) the role id when I'm inserting the artifact (package or simple detail index).
I have a few question:
- Is it unsafe to expect that top-level targets has not yet been updated when calling import_all? If not, couldn't we just use the bin numbers from
rstuf-bostrap-payload.json
to generateDelegations
object?`
Do we need to care about this safety here in development process?
I could use it from rstuf-bootstrap-payload.json
, but what if I generated a custom ceremony with RSTUF CLI.
For example, as a developer user I want to test a new feature or even test performance with 16384 delegation roles and I used kairo-payload.json
in the Makefile
or just run the bootstrap command from Makefile
? It could generate an inconsistency unless I change a specific call in the cli/tuf.py
.
- If we do need to look at the metadata, does warehouse have no other way, to get the latest metadata, than using the client updater?
I could do as I used to do before, loading the file /var/opt/warehouse/metadata
Line 114 in 2053563
- rstuf-metadata:/var/opt/warehouse/metadata |
Another option is get it from config.registry.settings["tuf.metadata.url"]
(http://files:9001/metadata) and get the 1.target.json?
, import it as TUF Metadata and get the succinct role.
- If we do need the client updater, why not use it all the way, via
refresh()
, and also update root?
I assumed that no updates in the metadata and the developer did: make initdb
, make tufinit
and make tufimport
where I tried to cover only a scenario where they just make a custome ceremony to generate a custom payload, but not a root metadata update.
- If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from
rstuf-bootstrap-payload.json
?
I assumed yes. IMHO, We shouldn't care here about the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to do here: I need to identify the Delegation role name, to fetch from RSTUF Database (table
rstuf_target_roles
) the role id when I'm inserting the artifact (package or simple detail index).
Wait, why does Warehouse even need to be concerned with those technical details? Shouldn't import-all
just throw target files at the RSTUF API, like normal release uploads would do?
MO using the tuf client here makes sense for two reasons:
- security -- tuf client guarantees that we get the correct metadata
- abstraction -- tuf client hides the complexity of getting metadata
A the same time we, we don't actually care for security here, because we ignore root updates. And we violate layering because we handle TUF metadata and manually resolve delegations inside Warehouse, which could be all hidden behind the RSTUF API.
If this is a workaround, because RSTUF does not yet implement an API endpoint for "backsigning" existing packages, and is only used in a controlled dev environment, then we should use the easiest way of getting this information, e.g. ls /var/opt/warehouse/metadata/*.targets.json | sort -r | head -1
, or just create the object by ourselves:
succinct_roles = SuccinctRoles(
keyids, # irrelevant
threshold, # irrelevant
bit_length, # Can't we get this from warehouse config or db?
name_prefix, # Can't we get this from warehouse config or db?
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see previous comment for how I think we should do this. Below replies are just to sort out misunderstandings...
I could use it from
rstuf-bootstrap-payload.json
, but what if I generated a custom ceremony with RSTUF CLI. For example, as a developer user I want to test a new feature or even test performance with 16384 delegation roles and I usedkairo-payload.json
in theMakefile
or just run the bootstrap command fromMakefile
? It could generate an inconsistency unless I change a specific call in thecli/tuf.py
.
I didn't mean the default rstuf-bootstrap-payload.json
included in this PR, but the one that the admin/developer actually uses to initialise RSTUF. I assumed that Warehouse would somehow keep a copy of that.
- If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from
rstuf-bootstrap-payload.json
?I assumed yes. IMHO, We shouldn't care here about the root.
I didn't mean the initial root metadata that is already in RSTUF, but the one that is uploaded to RSTUF. Again I assumed that Warehouse would somehow keep a copy of that when doing tufinit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why does Warehouse even need to be concerned with those technical details? Shouldn't
import-all
just throw target files at the RSTUF API, like normal release uploads would do?
We are doing an import, not a regular flow of adding targets using the PyPI API to add targets.
This is a procedure to speed up the process of Importing existing targets
Unfortunately, the Warehouse doesn't support the existing tool from RSTUF due to the SQLAlchemy version, and we implemented it in the Warehouse CLI for development.
A the same time we, we don't actually care for security here, because we ignore root updates. And we violate layering because we handle TUF metadata and manually resolve delegations inside Warehouse, which could be all hidden behind the RSTUF API.
If this is a workaround, because RSTUF does not yet implement an API endpoint for "backsigning" existing packages, and is only used in a controlled dev environment,
RSTUF Supports adding all targets to the API and publishing them to the metadata files in the storage by another API call.
But we don't want to run this procedure (designed for high-traffic), adding many artifacts (hundred thousand to millions) at once, so we would do a kind of data migration first.
then we should use the easiest way of getting this information, e.g.
ls /var/opt/warehouse/metadata/*.targets.json | sort -r | head -1
, or just create the object by ourselves:succinct_roles = SuccinctRoles( keyids, # irrelevant threshold, # irrelevant bit_length, # Can't we get this from warehouse config or db? name_prefix, # Can't we get this from warehouse config or db? )
Yes, I think we can go for the simpler way.
bit_length
: We can get it from the RSTUF API
name_prefix
: I would say it is irrelevant, RSTUF has predefine as "bins"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, you did think all of this through! ❤️ I strongly suggest to put this info (plus link) into a code comment, i.e. why we don't want to use the API, and can't use the CLI here.
5243442
to
2053563
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, Kairo! Let's resolve the discussion about "backsigning" and share this with Warehouse folks. 🚀
One more thing: Let's add a little bit more context about RSTUF and its current state to the PR description, when submitting upstream. |
0e59f72
to
818210a
Compare
Adds the RSTUF in the Warehouse infrastructure * Include the RSTUF Ceremony payload file - It is generated using `rstuf admin ceremony`, and the keys * Add the development dependencies - RSTUF CLI and dependencies * Include RSTUF components to the `docker-compose.yml` - RSTUF uses the same Redis Server but uses unique Redis DB ids `1` and `2` - RSTUF uses the same PostgreSQL, but a specific database rstuf * Add the RSTUF environment configuration for development * Define the Makefile commands for RSTUF - `make tufinit` to bootstrap the RSTUF service - `make tufimport` to import all project packages to the RSTUF service * Define the basic commands for RSTUF within Warehouse - Command to import all existent packages and indexes to TUF metadata (`warehouse tuf dev import-all`) * Add TUF development documentation Signed-off-by: Kairo de Araujo <[email protected]>
* Adding packages After adding a package to the Warehouse database, it generates and stores the Simple Index with a request to the RSTUF backend to include the package and its simple index in TUF Metadata. * Removing package or Project Release On PyPI Management, when a user removes a file or a project release it also removes it from TUF metadata and updates the simple details index. Co-authored-by: Lukas Puehringer <[email protected]> Signed-off-by: Kairo de Araujo <[email protected]> simplify code in warehouse.tuf.targets Signed-off-by: Kairo de Araujo <[email protected]>
Reduce the number of delegated hash-bin roles for the development enviroment. Signed-off-by: Kairo de Araujo <[email protected]>
Rename the environment variable setting `TUF_API_URL` to `RSTUF_API_URL` as this API is provided by Repository Service for TUF (RSTUF). Signed-off-by: Kairo de Araujo <[email protected]>
818210a
to
5751be8
Compare
This PR implements PEP 458 by adding a setup for Repository Service for TUF (RSTUF) and connecting Warehouse to it.
Context
Unlike previous attempts (pypi#7488, pypi#10870) to implement PEP 458, this PR does not deeply integrate TUF repository management into Warehouse, but instead configure a stand-alone RSTUF service, which maintain the TUF metadata repository as a black box, and which accepts calls to a REST API, so that Warehouse can indicate addition or removal of release files, and trigger TUF metadata changes.
Additionally, RSTUF provides a CLI for root signining in order to initialize the RSTUF metadata repository.
See RSTUF docs for details.
Description of this PR
Configure RSTUF (development instance)
docker-compose.yml
1
and2
Makefile
make tufinit
to bootstrap the RSTUF servicemake tufimport
to create TUF metadata for all existing release files from example Warehouse databaseIt is a make command for Warehouse CLI (
warehouse tuf dev import-all
)Add calls to RSTUF API upon package addition and removal
Status of RSTUF
RSTUF is close to releasing a beta version, called Minimum Working Version (MWV)). Actually, two of three components (RSTUF Worker and API) are already tagged MWV. The third component (RSTUF CLI) is missing one feature, which is not relevant for this PR, and not expected to break compatibility for the MWV release.
TODO
docker-compose.yml
)