-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Search the active project for artifacts #1727
Conversation
This conversation offers some context: https://docs.google.com/document/d/1Z63eqhkLYIg5iviwmZnq741zp5kIwGO1spf7zeSjAX8/edit |
src/Artifacts.jl
Outdated
@@ -45,7 +45,9 @@ current set of depot paths and the current artifact directory override via the m | |||
""" | |||
function artifacts_dirs(args...) | |||
if ARTIFACTS_DIR_OVERRIDE[] === nothing | |||
return [abspath(depot, "artifacts", args...) for depot in depots()] | |||
depot_artifacts = [abspath(depot, "artifacts", args...) for depot in depots()] # search all depots | |||
project_artifacts = [abspath(dirname(Base.active_project()), "artifacts", args...)] # search the active project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can active_project()
be nothing
in some circumstances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but it can't hurt to be prepared for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, Base.ACTIVE_PROJECT[]
can be nothing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about active_project()
. I think yes, but would need to be looked at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so now we check to make sure that active_project()
is not nothing
and active_project()
is not empty.
src/Artifacts.jl
Outdated
else | ||
project_artifacts = [abspath(dirname(Base.active_project()), "artifacts", args...)] # search the active project | ||
end | ||
return vcat(depot_artifacts, project_artifacts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching the order seems better—might as well look in the project first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but here's the problem. If we put the project first, then, if the artifact needs to be downloaded, it will always be downloaded into the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for most users, we want to download artifacts into the first depot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could return them in this order when searching/reading:
- project first, then depots
But return them in this order when downloading/creating/writing:
- depots first, then project
But... this might make things more confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be interesting to look at DataDeps.jl's solution which is basically what you propose.
https://white.ucc.asn.au/DataDeps.jl/stable/z10-for-end-users/#The-Load-Path-1
Basically the search order starts looking in project, then continues to more and more general (it allows many locations).
the save order attempts to save in the same list of locations (since by design it can fail, and want to move on to next), but skips the project specific one at the top.
however. that is almost completely irrelevent for you.
Search order doesn't matter because content addressing promises that no matter where you find a match, you know that match is good and is identical to anywhere else you might find a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With artifacts, search order doesn't matter for correctness, (since it's content-addressed, if you find it, it's the right thing) but it could matter for performance. IMO you're more likely to have something sitting in your overall location (project-local deps will not be the norm, I don't think) so keeping it this way is better, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another reason why treating the project as a depot more generally is wrong: it should only be a place you look in when loading artifacts. If it's not there, you download in the normal fashion into the standard user depot. Looking in $project/artifacts
first is absolutely the right thing to do, however: if the project has bothered to vendor its dependencies, we should use those vendored dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staticfloat, I don't understand the performance consideration. Why would the system copy of an artifact be better than the project copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to be clear, the plan is that we will prepend the project to the front of the list of search paths when we are searching for an artifact. But when we are creating artifacts, we are going to use the user depot, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staticfloat, I don't understand the performance consideration. Why would the system copy of an artifact be better than the project copy?
It's literally just saving a stat()
or two. It's not the performance of the binaries themselves, but merely how quickly you find them. I don't think it's a big issue, even on Windows.
Does it make sense to use |
Personally I'd like to keep this as simple as possible. The motivation is to make a single self-contained tarball, which we have dubbed "standalone bundles". The only things we need to accomplish that are artifacts and packages. |
How does using |
Oh, I misunderstood. I thought you wanted me to add the "automatically load system image" feature to this PR. |
Yeah, doing Certainly putting everything under one common parent folder will make it easier to e.g. add this stuff to a If we don't want to do a hidden folder, I suppose we could do |
So, here are some candidates to vote on:
|
A non-hidden path like |
I was thinking maybe we do Or, since we are calling these "standalone bundles", maybe |
I don't have a strong opinion on the name itself (other than using a single parent directory), but my preference is |
Hidden folder (dot) or regular folder (no dot)? |
I slightly prefer hidden directory |
It seems non-general to me to single out artifacts for this. Why not packages or other things in the depot (precompile files)? To me, this would be better implemented as adding a new entry into |
|
My point is that if this is implemented correctly, there should be no need to change Pkg at all (nor code loading like in that PR) because everything should fall out from just having an extra DEPOT_PATH entry which is already handled by everything. |
Ah, I see what you mean. I’m not sure how this special entry would be implemented though. |
Specifically, how do you make sure that whenever the active project is changed, you change the corresponding depot entry? |
This is why I keep suggesting making it a proper depot path, because the next guy is gonna ask to also bundle the I still think it is not too bad to make it explicit and put
in some initialization script
or w/e. If this for some reason has to be done automatically, then we should just add
Switching projects at runtime is a bit iffy anyway, but my suggestion above should work just as switching the load path entry |
Yeah, the "correct" implementation of this seems to me to add a special entry to |
IIUC, |
When @StefanKarpinski, @davidanthoff, and I were having the conversation on Slack, my intent was purposefully not to create a full depot in the project. I very intentionally only want to look for artifacts and packages, because that is the bare minimum we need to make the "self-contained bundles". If there is an overwhelming consensus that we should instead modify the way But personally, my vote is that we only implement the absolute bare minimum that we need to have self-contained bundles. |
To elaborate on that a little bit: the only things that I am suggesting that we allow people to store inside projects are artifacts and packages. Artifacts are content-addressed with the hash. Packages are content-addressed with the package name plus the slug. The artifacts and packages could exist anywhere. No matter what location we find them in, the content is exactly the same. So we are just adding another location to the search path. This does not apply to the other things that people store inside regular depots (registries, config files, This is why I only want to expand the search for artifacts and packages. |
I have started a pull request that implements what you describe: JuliaLang/julia#35207 I don't think that is the correct approach. But I figured I would start work on implementing it while we continue this discussion. |
HARD DISAGREE with making the project a full depot. I'll quote my entire comment here since this seems to be where the discussion ended up taking place: I can see the appeal of this since it allows the user to choose whether to look in a project for vendored dependencies (both packages and artifacts). However, I have a couple of issues with it:
An alternative would be to make the default However, I think that unconditionally using vendored packages and artifacts when loading resources for a project would be fine and would be simpler. After all, what's the benefit of opting out? It's not like there's a security issue here: if the project wants to load code that it ships with, it can do that much more easily than by shipping a vendored artifact or package. |
I agree with all of that. |
Stefan's PR JuliaLang/julia#35222 implements searching the project first for packages. I'll update this PR to search the project first for artifacts. Here's the one caveat: we are going to search the project first for artifacts, then we will search the other locations. But when we download new artifacts, I don't think we should put them in the project by default. I think we should put them in the user depot (which is the current behavior). Is everyone fine with that? So we will change the search order for reading artifacts, but we won't chance the behavior for writing artifacts. Because I think by default we should still write new artifacts to the user depot |
Absolutely. The only role I see for the project package/artifact directories is as collection of content-addressed resources that might be needed when loading the project. Package & artifact installation by Pkg should still be to the normal place. A completely separate tool should be used to copy content-addressed resources into a project to make it standalone, with options to control whether to copy lazy artifacts or not and whether to copy artifacts for all platforms or just some. |
Alright, I’ve updated the PR. We now search the project first before searching other locations. When writing artifacts, we write them to the user depot. |
Just FYI, searching the project first makes the code significantly more complicated than if we search the project last. This is because currently a lot of places in Pkg hardcode the assumption that the first artifact search path is also the place we would write new artifacts to. |
Closes #1726
The other half is located here: JuliaLang/julia#35222
Needs tests.
Needs docs.
Needs changelog.
This pull request adds
/path_to_active_project/artifacts
to the end of the list of locations that we search when looking for artifacts.h/t: @oxinabox for suggesting an expansion to the artifact search path
h/t: @fredrikekre for making an example of a depot stored inside a project
cc: @davidanthoff
cc: @StefanKarpinski
Also contributed to the discussion: Kristoffer Carlsson, Elliot Saba, Sascha Mann