Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hg: Add a cache for mercurial repositories. #372

Merged
merged 12 commits into from
Jun 12, 2024

Conversation

cdevienne
Copy link
Contributor

@cdevienne cdevienne commented Apr 2, 2024

hg: Add a cache for mercurial repositories.

The idea is to save the whole untouched clone (with no checkout) in the cache.

If already present, the fetch (a pull actually for hg) is done directly in the cache, and is
faster (except on very small repos) because only new changesets are
transfered.

If the ref is a changeset id (not a tag, branch, topic or bookmark), and
the changeset is already known in the cached clone, no pull is done
which avoid any network exchange.

Then we copy the cached entry and do the checkout.

The same feature for git will be proposed very soon, so any feedback on this one is welcome if it helps doing things better for git!

@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from ad46667 to 1f7ac92 Compare April 2, 2024 19:18
@cdevienne cdevienne changed the title Hg cache hg: Add a cache for mercurial repositories. Apr 2, 2024
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
@kumaritanushree
Copy link
Contributor

@cdevienne not sure how much effort will require but if possible can you please add test for this?

@cdevienne
Copy link
Contributor Author

@kumaritanushree I will write some tests. Do you need more precisions on the CacheID logic I implemented?

@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from 2a8ea07 to ff75179 Compare May 7, 2024 11:41
@cdevienne cdevienne marked this pull request as ready for review May 7, 2024 11:44
Copy link
Member

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions and left some comments.

test/e2e/hg_test.go Outdated Show resolved Hide resolved
test/e2e/hg_test.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/sync.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
pkg/vendir/fetch/hg/hg.go Outdated Show resolved Hide resolved
@cdevienne cdevienne force-pushed the hg-cache branch 2 times, most recently from df01894 to b738d70 Compare May 10, 2024 09:47
Copy link
Member

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind fixing the rebase so that we do not have all the other changes in this PR?

Signed-off-by: Christophe de Vienne <[email protected]>
The idea is to save the whole untouched clone (with no checkout) in the cache.

If already present, the pull is done directly in the cache, and is
faster (except on very small repos) because only new changeset are
transfered.

If the ref is a changeset id (not a tag, branch, topic or bookmark), and
the changeset is already known in the cached clone, no pull is done
which avoid any network exchange.

Then we copy the cached entry and do the checkout.

Signed-off-by: Christophe de Vienne <[email protected]>
Signed-off-by: Christophe de Vienne <[email protected]>
The repo URL must be in the cache id.
The ref is purposely not included in it because we want to reuse the cached repository
when the ref moves.
And finally, we use a sha256 hash to mask any authentication data because we don't
want them to be readable in the cache folder name.

Signed-off-by: Christophe de Vienne <[email protected]>
Signed-off-by: Christophe de Vienne <[email protected]>
Signed-off-by: Christophe de Vienne <[email protected]>
Use only the repository URL as a cacheID

Signed-off-by: Christophe de Vienne <[email protected]>
@cdevienne
Copy link
Contributor Author

cdevienne commented May 21, 2024

I un-exported the "Hg" type completely, as it has no reasonable use outside this package.

@Zebradil
Copy link
Member

@joaopapereira should I be able to approve workflow runs?

@praveenrewar
Copy link
Member

@Zebradil Are you not able to approve? Let me check the permissions in that case. For now I am approving.

@Zebradil
Copy link
Member

@praveenrewar No, I didn't have the button.

@Zebradil
Copy link
Member

@cdevienne thank you for the adjustments. Now we can see that the localClone function is unused. Could you remove it? Or was it supposed to be used at some point?

@praveenrewar
Copy link
Member

No, I didn't have the button.

@Zebradil Please check now. I am still figuring out some of the permissions stuff, so please excuse me if it still doesn't work 😅

@Zebradil
Copy link
Member

I can't test if it works now, as there are no pipelines to approve :)
Let's wait until we have another PR.

Signed-off-by: Christophe de Vienne <[email protected]>
@cdevienne
Copy link
Contributor Author

@cdevienne thank you for the adjustments. Now we can see that the localClone function is unused. Could you remove it? Or was it supposed to be used at some point?

my bad I forgot to run golangci-lint. I cleaned up the code.

@Zebradil
Copy link
Member

@cdevienne sorry, I didn't notice earlier that with the new functions some existing structs (Hg, HgInfo, ...) and functions (NewHg, ...) became unexported too. This could be a breaking change for someone who might use them in their code. Could you revert this change to the said structs and functions?

@Zebradil
Copy link
Member

@praveenrewar I don't see the button to approve workflow runs.

image

@cdevienne
Copy link
Contributor Author

@cdevienne sorry, I didn't notice earlier that with the new functions some existing structs (Hg, HgInfo, ...) and functions (NewHg, ...) became unexported too. This could be a breaking change for someone who might use them in their code. Could you revert this change to the said structs and functions?

It is a breaking change indeed, but I would be surprised it breaks anything. I don't see a point in keeping these public if it doesn't break any code. If it does I would happily revert the change.

@cdevienne
Copy link
Contributor Author

Precising my point: I think the internal of the fetchers are not a part of the contract vendir has with its users. So it should not have been public in the first place.

@Zebradil
Copy link
Member

I think the internal of the fetchers are not a part of the contract vendir has with its users.

I doubt that this code is used outside of vendir. However, it is located under pkg which, if I understand correctly, makes it "public". If it were under internal I wouldn't bother much.

I don't see a point in keeping these public if it doesn't break any code.

Yes, but how do we know if it doesn't break someone else's code?

@joaopapereira could you give a suggestion here?

Removing symbols from public API is a breaking change according to the semantic versioning spec and we should implement it carefully with adequate communication. But vendir's major version is still 0 which gives us some freedom to break the public API. I personally would commit this (breaking) change and see who reacts on this and what are their use cases. Also, ytt, for example, discourages using it as a module: https://github.com/carvel-dev/ytt/blob/develop/examples/integrating-with-ytt/apis.md#as-a-go-module. I think for vendir these considerations make sense too.

@joaopapereira
Copy link
Member

As a generic rule of thumb in Carvel we tend to:

  • Minimize the breaking changes, our tools are still all in the major 0 version, which gives us some leeway to do breaking changes. Nevertheless, we should aim to minimize it
  • We try to keep consistency whenever possible

I like the idea of exposing only the sync, but we should standardize what this sync interface should have before we do that. This being said, I would recommend we keep exported Structs exported, and in the future, if we want to invest time in getting an interface going that would support all the fetching types, we can create the interface and make all things private that we want to make private.

In terms of API consumption for the majority of the tools, we expect people to use the Cobra commands interface (not the best experience) until we are in a position where we have a supported API for each tool like we currently have in imgpkg.

@joaopapereira
Copy link
Member

@praveenrewar I don't see the button to approve workflow runs.

image

Only folks with write access can see that button, and write access comes with the Approver Role.

@cdevienne
Copy link
Contributor Author

Should I make the structs public again ? What about their functions ?
Or can it be merged as it is ?

@joaopapereira
Copy link
Member

Let us make them public again and we will think in the future on a way to make a good interface API for this

@cdevienne
Copy link
Contributor Author

ok doing it. Be aware that the 'Retrieve' function does not exist anymore, and that the Hg struct has no exposed function aside from a constructor and a Close function.

@cdevienne
Copy link
Contributor Author

Just to be clear: the Hg public interface is changing, an the new one is useless as it is. Which is why I unexposed it completely.
Pushing the change you asked for anyway, let me know if you need any other change.

Signed-off-by: Christophe de Vienne <[email protected]>
@cdevienne
Copy link
Contributor Author

Hi @joaopapereira, any chance we merge this PR soon ?

Copy link
Member

@joaopapereira joaopapereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joaopapereira
Copy link
Member

@Zebradil can you give it a last review please and if you are ok with it I will merge the PR

Copy link
Member

@Zebradil Zebradil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joaopapereira joaopapereira merged commit f59017e into carvel-dev:develop Jun 12, 2024
4 checks passed
@cdevienne
Copy link
Contributor Author

Thanks!

@github-actions github-actions bot added the carvel-triage This issue has not yet been reviewed for validity label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
carvel-triage This issue has not yet been reviewed for validity
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants