Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuing #833 RE: Generating Checksum Named Files for AptByHash #884

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

acheng-01
Copy link

This PR is to continue the work of adamsanaglo for AptByHash. It includes code changes for mitigating the Hash Sum Mismatch error by generating additional metadata files named by checksum for AptByHash, as well as changes advised by quba in the adamsanaglo's original PR.

Closes #795

@acheng-01 acheng-01 force-pushed the acheng01/aptbyhash-continuation branch 2 times, most recently from 012ca8d to 4d728f0 Compare September 5, 2023 16:17
@quba42
Copy link
Collaborator

quba42 commented Sep 6, 2023

Regarding the failed "rady-to-ship" CI test, I believe the CI wants closes #795 in the commit message instead of closes pulp#795. Automation is picky that way.

Apart from other small details, I believe the ball is back in our court to provide a final review for the feature itself. I am not sure I will get to it this week, but it is now near the top of our ToDo list.

@daviddavis
Copy link
Contributor

I think the commit message overall needs to be improved. It should describe the change/feature.

@acheng-01 acheng-01 force-pushed the acheng01/aptbyhash-continuation branch from 4d728f0 to 77d2849 Compare September 6, 2023 21:51
@quba42 quba42 added the .feature CHANGES/<issue_number>.feature label Sep 14, 2023
Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off some more nitpicks (small things of minor importance). I will post my thoughts on the feature itself separately.

Final Nitpick: The latest iteration of the commit message is now:

Added functionality to generate checksum named metadata files for AptByHash that was initially started by adamsanaglo.

Addressed git commit and code formatting issues pointed out by quba42

closes #795
  • The title line should be shorter and use present tense.
  • The fact that something from previous review iterations was addressed is not interesting two years from now.

My suggestion is something like:

Add ability to publish package indices using AptByHash format

closes #795

pulp_deb/app/tasks/publishing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/publishing.py Outdated Show resolved Hide resolved
pulp_deb/app/settings.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/publishing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/publishing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/publishing.py Outdated Show resolved Hide resolved
@quba42
Copy link
Collaborator

quba42 commented Sep 14, 2023

Now my thoughts on the feature itself:

I have verified that this works as designed and adds a by-hash/ folder with all the right files/paths next to the package indices.

It does not add the by-hash/ folder next to the Release file.

So my final question is, how useful/complete is this feature in its current form? If I read https://wiki.ubuntu.com/AptByHash it just talks about the Release file by-hash/ folder, and not the one we are adding here.

Is there a risk that advertising Acquire-By-Hash: yes in the release file, and not providing the release file by-hash/ folder might confuse some clients?

Since I have little experience with this feature, I don't know the answer to these questions. However the answer is essential to my final question before merging:

Right now this behaviour can be enabled/disabled via the APT_BY_HASH setting which defaults to true. Is this feature in a state where we want to enable it for everyone by default? Is there a case for letting us toggle this feature for individual publications rather than just by global setting? (My feeling is that this would be overkill. If the feature can't hurt we should simply enable it by default, if we have doubts about the completeness, we should disable it by default, and revisit as the need arises).

@daviddavis I am guessing some of these big picture questions are as much for you as for @acheng-01 😉

@daviddavis
Copy link
Contributor

daviddavis commented Sep 14, 2023

So my final question is, how useful/complete is this feature in its current form? If I read https://wiki.ubuntu.com/AptByHash it just talks about the Release file by-hash/ folder, and not the one we are adding here.

That's interesting and I had not noticed that in the spec. If I look at the debian repo for example, it has the by-hash folder in the directory containing the Packages but not in the Release folder. Ubuntu does have a by-hash folder in the Release folder.

We should probably add these extra by-hash folders for completeness and to align with the spec but also, I tested it out using Ubuntu and it seemed to work (I confirmed it used the by-hash files) so I think this change is probably safe as is.

Right now this behaviour can be enabled/disabled via the APT_BY_HASH setting which defaults to true. Is this feature in a state where we want to enable it for everyone by default?

I would argue that we should have this feature disabled by default. We're running Pulp behind a reverse proxy that caches various files and one of the important pieces of this feature is that the by-hash files should be cached and served by default for 3 days. Serving these files directly from Pulp doesn't provide for this so I think it makes sense to disable this feature for most users.

Moreover, I think we should probably provide accompanying docs with this feature in which we tell users that they are responsible for setting up a reverse proxy with cache in order to cache the by-hash files.

@acheng-01 acheng-01 marked this pull request as draft September 25, 2023 20:39
@acheng-01 acheng-01 force-pushed the acheng01/aptbyhash-continuation branch from 77d2849 to fe55235 Compare September 25, 2023 20:55
Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks very clean now.
Given what @daviddavis said, I made a suggestion regarding the changelog entry.

Given the feature is now disabled by default, I leave it up to you if you want me to merge the current state, or if you prefer to extend it first. I see the following possible follow up tasks (but I am fine if they do not materialize):

  • Add the release file by-hash/ folder as well.
  • Add documentation on how to use this feature with reverse proxy.
  • Extend a publication test to ensure that indices are in fact being published in the by-hash location.

CHANGES/795.feature Outdated Show resolved Hide resolved
@acheng-01
Copy link
Author

acheng-01 commented Sep 28, 2023

Regarding adding the release file by-hash folder, after taking a closer look at the specs and doing more thinking, I have decided not to do it. To expand:

I think Adams's original PR was aimed at following the specs for debian rather than for ubuntu as @quba42 had linked. In that regard, the code works and now there is a by-hash generated for each architecture's packages within the repo.

In addition to the difference in spec, I believe pulp reads directly from the Release file that already contains all the hash digests for the repository's packages; it doesn't touch the by-hash folder at all. This being the case, a by-hash folder in release wouldn't add much value other than caching.

@daviddavis do you have anything to add?

@daviddavis
Copy link
Contributor

I think that makes sense. When I tested this out with Ubuntu here, apt pulled the Packages files from the by-hash folder in the Packages folder. I'm not sure what value the Releases by-hash folder serves. At least for us, the problem was that clients were pulling Packages files that didn't match the Release so providing the Release by-hash folder seems pointless.

It might be worth stating in the docs and changelog entry that this implementation conforms to the debian spec that you linked to (and not the Ubuntu although it does appear that apt on Ubuntu does properly handle the hash files).

@quba42
Copy link
Collaborator

quba42 commented Oct 2, 2023

@acheng-01 If you update the release notes you can set this to "ready for review" and we can merge it.

@acheng-01
Copy link
Author

@quba42 Sorry for the delays here. I've been trying to extend one of the publishing tests to include verification that the by-hash folder is created when APT_BY_HASH is enabled without much success. Because that hashed_index_path is being saved to hashed_index, I feel that I might not be able to access it in the test without changing the APIs in pulp_deb-client or modifying some pre-existing model also within that plugin. (Please let me know if I overlooked something, overthough something, or if there is a way to test for this without altering stuff in pulp_deb-client...)

I have tested this outside of the testing environment though, and I can confirm that the by-hash folder is generated upon publishing a repo. It seems like you would be ok with me just setting this as ready to merge though, so I'll go ahead and mark this as ready for now.

@acheng-01 acheng-01 marked this pull request as ready for review October 6, 2023 23:16
Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to squish all your commits together for the "Deb CI / ready-to-ship (pull_request)" test to turn green.

Apart from that this looks ready to me. The additional "Feature Overview" docs look good to me.

Regarding tests: I am not sure how simple it is to retrieve the checksum of the publication artifact, though if worst came to worst, one could simply download the published/hosted Release file, retrieve a Package index from that, download it, calculate its checksum, and then check for the presence of the relevant APT by hash path. This would work completely without use of Pulp client, but would not be very elegant. I think we can postpone this and live without that test for now.

@acheng-01 acheng-01 force-pushed the acheng01/aptbyhash-continuation branch 2 times, most recently from 49bf40f to 59a0a2b Compare October 9, 2023 16:49
@acheng-01 acheng-01 force-pushed the acheng01/aptbyhash-continuation branch from 59a0a2b to e242b8c Compare October 9, 2023 16:51
Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ongoing discussions about how to develop this feature further not withstanding, I am fine with merging the current state.

@quba42 quba42 merged commit bfe5364 into pulp:main Oct 23, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.feature CHANGES/<issue_number>.feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AptByHash
4 participants