Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Search the active project for artifacts #1727

Closed
wants to merge 7 commits into from
Closed

WIP: Search the active project for artifacts #1727

wants to merge 7 commits into from

Conversation

DilumAluthge
Copy link
Member

@DilumAluthge DilumAluthge commented Mar 21, 2020

Closes #1726
The other half is located here: JuliaLang/julia#35222

Needs tests.
Needs docs.
Needs changelog.


This pull request adds /path_to_active_project/artifacts to the end of the list of locations that we search when looking for artifacts.


h/t: @oxinabox for suggesting an expansion to the artifact search path
h/t: @fredrikekre for making an example of a depot stored inside a project
cc: @davidanthoff
cc: @StefanKarpinski
Also contributed to the discussion: Kristoffer Carlsson, Elliot Saba, Sascha Mann

@StefanKarpinski
Copy link
Member

This conversation offers some context:

https://docs.google.com/document/d/1Z63eqhkLYIg5iviwmZnq741zp5kIwGO1spf7zeSjAX8/edit

src/Artifacts.jl Outdated
@@ -45,7 +45,9 @@ current set of depot paths and the current artifact directory override via the m
"""
function artifacts_dirs(args...)
if ARTIFACTS_DIR_OVERRIDE[] === nothing
return [abspath(depot, "artifacts", args...) for depot in depots()]
depot_artifacts = [abspath(depot, "artifacts", args...) for depot in depots()] # search all depots
project_artifacts = [abspath(dirname(Base.active_project()), "artifacts", args...)] # search the active project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can active_project() be nothing in some circumstances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but it can't hurt to be prepared for that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, Base.ACTIVE_PROJECT[] can be nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what about active_project(). I think yes, but would need to be looked at.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so now we check to make sure that active_project() is not nothing and active_project() is not empty.

src/Artifacts.jl Outdated
else
project_artifacts = [abspath(dirname(Base.active_project()), "artifacts", args...)] # search the active project
end
return vcat(depot_artifacts, project_artifacts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the order seems better—might as well look in the project first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but here's the problem. If we put the project first, then, if the artifact needs to be downloaded, it will always be downloaded into the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for most users, we want to download artifacts into the first depot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could return them in this order when searching/reading:

  • project first, then depots

But return them in this order when downloading/creating/writing:

  • depots first, then project

But... this might make things more confusing?

Copy link
Contributor

@oxinabox oxinabox Mar 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be interesting to look at DataDeps.jl's solution which is basically what you propose.
https://white.ucc.asn.au/DataDeps.jl/stable/z10-for-end-users/#The-Load-Path-1
Basically the search order starts looking in project, then continues to more and more general (it allows many locations).
the save order attempts to save in the same list of locations (since by design it can fail, and want to move on to next), but skips the project specific one at the top.

however. that is almost completely irrelevent for you.
Search order doesn't matter because content addressing promises that no matter where you find a match, you know that match is good and is identical to anywhere else you might find a match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With artifacts, search order doesn't matter for correctness, (since it's content-addressed, if you find it, it's the right thing) but it could matter for performance. IMO you're more likely to have something sitting in your overall location (project-local deps will not be the norm, I don't think) so keeping it this way is better, IMO.

Copy link
Member

@StefanKarpinski StefanKarpinski Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another reason why treating the project as a depot more generally is wrong: it should only be a place you look in when loading artifacts. If it's not there, you download in the normal fashion into the standard user depot. Looking in $project/artifacts first is absolutely the right thing to do, however: if the project has bothered to vendor its dependencies, we should use those vendored dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat, I don't understand the performance consideration. Why would the system copy of an artifact be better than the project copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to be clear, the plan is that we will prepend the project to the front of the list of search paths when we are searching for an artifact. But when we are creating artifacts, we are going to use the user depot, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat, I don't understand the performance consideration. Why would the system copy of an artifact be better than the project copy?

It's literally just saving a stat() or two. It's not the performance of the binaries themselves, but merely how quickly you find them. I don't think it's a big issue, even on Windows.

@tkf
Copy link
Member

tkf commented Mar 21, 2020

Does it make sense to use .julia as the parent directory (i.e., /path_to_active_project/.julia/artifacts and /path_to_active_project/.julia/packages), to avoid cluttering the package directory? I think there are other things you might want to make project-local. For example, it would also be nice if the system image is automatically loaded from, e.g., /path_to_active_project/.julia/sys.so JuliaLang/julia#33973 (comment)

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

Personally I'd like to keep this as simple as possible. The motivation is to make a single self-contained tarball, which we have dubbed "standalone bundles". The only things we need to accomplish that are artifacts and packages.

@tkf
Copy link
Member

tkf commented Mar 21, 2020

How does using /path_to_active_project/.julia/artifacts complicate the PRs? It's just one more argument to abspath. I'm just trying to point out that you might need other kinds of project-specific storage in the future.

@DilumAluthge
Copy link
Member Author

How does using /path_to_active_project/.julia/artifacts complicate the PRs? It's just one more argument to abspath. I'm just trying to point out that you might need other kinds of project-specific storage in the future.

Oh, I misunderstood. I thought you wanted me to add the "automatically load system image" feature to this PR.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

Yeah, doing /path_to_active_project/.julia/artifacts instead of /path_to_active_project/artifacts seems fine to me!

Certainly putting everything under one common parent folder will make it easier to e.g. add this stuff to a .gitignore file, etc.

If we don't want to do a hidden folder, I suppose we could do /path_to_active_project/julia/artifacts instead.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

So, here are some candidates to vote on:

Artifacts Packages
/path_to_active_project/artifacts /path_to_active_project/packages
/path_to_active_project/.julia/artifacts /path_to_active_project/.julia/packages
/path_to_active_project/julia/artifacts /path_to_active_project/julia/packages
/path_to_active_project/.julia_modules/artifacts /path_to_active_project/.julia_modules/packages
/path_to_active_project/julia_modules/artifacts /path_to_active_project/julia_modules/packages
/path_to_active_project/.julia_bundles/artifacts /path_to_active_project/.julia_bundles/packages
/path_to_active_project/julia_bundles/artifacts /path_to_active_project/julia_bundles/packages

@tkf
Copy link
Member

tkf commented Mar 21, 2020

A non-hidden path like /path_to_active_project/julia/* sounds good to me, too. The original inspiration was from the default depot path.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

I was thinking maybe we do julia_modules, in a parallel to node_modules?

Or, since we are calling these "standalone bundles", maybe julia_bundles?

@tkf
Copy link
Member

tkf commented Mar 21, 2020

I don't have a strong opinion on the name itself (other than using a single parent directory), but my preference is julia > julia_bundles > julia_modules.

@DilumAluthge
Copy link
Member Author

Hidden folder (dot) or regular folder (no dot)?

@tkf
Copy link
Member

tkf commented Mar 21, 2020

I slightly prefer hidden directory .julia etc. more.

@KristofferC
Copy link
Member

KristofferC commented Mar 21, 2020

It seems non-general to me to single out artifacts for this. Why not packages or other things in the depot (precompile files)? To me, this would be better implemented as adding a new entry into DEPOT_PATH that resolves to the current active project. Then the same thing can be used for packages and precompile files. This would then not require any changes to Pkg and you could "instantiate" your whole project depot by setting that entry as DEPOT_PATH[1], removing all other depot entries and instantiate.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

Why not packages

See JuliaLang/julia#35195

@KristofferC
Copy link
Member

KristofferC commented Mar 21, 2020

My point is that if this is implemented correctly, there should be no need to change Pkg at all (nor code loading like in that PR) because everything should fall out from just having an extra DEPOT_PATH entry which is already handled by everything.

@DilumAluthge
Copy link
Member Author

My point is that if this is implemented correctly, there should be no need to change Pkg at all (nor code loading like in that PR) because everything should fall out from just having an extra DEPOT_PATH entry which is already handled by everything.

Ah, I see what you mean.

I’m not sure how this special entry would be implemented though.

@DilumAluthge
Copy link
Member Author

Specifically, how do you make sure that whenever the active project is changed, you change the corresponding depot entry?

@fredrikekre
Copy link
Member

fredrikekre commented Mar 21, 2020

My point is that if this is implemented correctly, there should be no need to change Pkg at all (nor code loading like in that PR) because everything should fall out from just having an extra DEPOT_PATH entry which is already handled by everything.

This is why I keep suggesting making it a proper depot path, because the next guy is gonna ask to also bundle the registries directory, and then someone wants to include config etc.

I still think it is not too bad to make it explicit and put

push!(DEPOT_PATH, @__DIR__) # or push!(DEPOT_PATH, joinpath(@__DIR__, ".julia")) if you prefer

in some initialization script init.jl and then document that you run this project by

julia -L init.jl main.jl

or w/e.

If this for some reason has to be done automatically, then we should just add "@" at the end of the default DEPOT_PATH just like we have "@" at the front of the default LOAD_PATH. This would then expand to joinpath(dirname(active_project()), ".julia"). (Edit: See e.g. JuliaLang/julia@173a367 -- needs some more changes to expand this at every use of DEPOT_PATH but yea.)

Specifically, how do you make sure that whenever the active project is changed, you change the corresponding depot entry?

Switching projects at runtime is a bit iffy anyway, but my suggestion above should work just as switching the load path entry "@" does.

@KristofferC
Copy link
Member

KristofferC commented Mar 21, 2020

Yeah, the "correct" implementation of this seems to me to add a special entry to DEPOT_PATH that resolves to the current project, similar to how things are handled with LOAD_PATH. Then we need a function like Base.depot_path() (equivalent to Base.load_path()) and everyone uses that and then things agree with each other.

@tkf
Copy link
Member

tkf commented Mar 21, 2020

Base.depot_path() sounds like the way to go.

and then someone wants to include config etc.

IIUC, startup.jl file won't be included if the project-local depot is at the end of the list? Though this probably just has to be clearly documented (or special-cased).

@DilumAluthge
Copy link
Member Author

When @StefanKarpinski, @davidanthoff, and I were having the conversation on Slack, my intent was purposefully not to create a full depot in the project. I very intentionally only want to look for artifacts and packages, because that is the bare minimum we need to make the "self-contained bundles".

If there is an overwhelming consensus that we should instead modify the way Base.DEPOT_PATH works, I’ll defer to that consensus. I think that change will be more complicated. For example, we will need to implement this new Base.depot_path() function. We will also need to go through every location in Base Julia and in Pkg that currently uses Base.DEPOT_PATH and replace that with a use of Base.depot_path(). Furthermore, adding a special entry like @ will also break every user’s code that currently assumes that Base.DEPOT_PATH is a list of depot paths.

But personally, my vote is that we only implement the absolute bare minimum that we need to have self-contained bundles.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Mar 21, 2020

To elaborate on that a little bit: the only things that I am suggesting that we allow people to store inside projects are artifacts and packages.

Artifacts are content-addressed with the hash. Packages are content-addressed with the package name plus the slug.

The artifacts and packages could exist anywhere. No matter what location we find them in, the content is exactly the same. So we are just adding another location to the search path.

This does not apply to the other things that people store inside regular depots (registries, config files, startup.jl files, etc.).

This is why I only want to expand the search for artifacts and packages.

@DilumAluthge
Copy link
Member Author

Yeah, the "correct" implementation of this seems to me to add a special entry to DEPOT_PATH that resolves to the current project, similar to how things are handled with LOAD_PATH. Then we need a function like Base.depot_path() (equivalent to Base.load_path()) and everyone uses that and then things agree with each other.

I have started a pull request that implements what you describe: JuliaLang/julia#35207

I don't think that is the correct approach. But I figured I would start work on implementing it while we continue this discussion.

@StefanKarpinski
Copy link
Member

HARD DISAGREE with making the project a full depot. I'll quote my entire comment here since this seems to be where the discussion ended up taking place:


I can see the appeal of this since it allows the user to choose whether to look in a project for vendored dependencies (both packages and artifacts). However, I have a couple of issues with it:

  1. I think we should look in the project for packages and artifacts first. After all, if the project has bothered to ship vendored resources, shouldn't we use the project's version of those resources?

  2. I don't think we should be treating the project as a depot for all purposes. If $project/registries exists, should we look in registries that are included there? Seems like a bad idea. Similarly, if the project contains $project/servers should we let the authentication info included there apply to talking to authenticated servers and provide telemetry info? If there is an $project/environments directory, should we consider those to be named environments when this project happens to be active? These all seem like a bad idea.

An alternative would be to make the default DEPOT_PATH have @ as its first entry (and replace Pkg.depots1() with Pkg.user_depot() which skips the @ and is a better name anyway). We could then have a toggle for whether we should expand the @ or drop it and expand it when looking for packages and artifacts and drop it when looking for anything else. That would allow people to opt-out of loading vendored packages and artifacts.

However, I think that unconditionally using vendored packages and artifacts when loading resources for a project would be fine and would be simpler. After all, what's the benefit of opting out? It's not like there's a security issue here: if the project wants to load code that it ships with, it can do that much more easily than by shipping a vendored artifact or package.

@DilumAluthge
Copy link
Member Author

I agree with all of that.

@DilumAluthge
Copy link
Member Author

Stefan's PR JuliaLang/julia#35222 implements searching the project first for packages.

I'll update this PR to search the project first for artifacts.

Here's the one caveat: we are going to search the project first for artifacts, then we will search the other locations.

But when we download new artifacts, I don't think we should put them in the project by default. I think we should put them in the user depot (which is the current behavior). Is everyone fine with that? So we will change the search order for reading artifacts, but we won't chance the behavior for writing artifacts.

Because I think by default we should still write new artifacts to the user depot

@StefanKarpinski
Copy link
Member

Absolutely. The only role I see for the project package/artifact directories is as collection of content-addressed resources that might be needed when loading the project. Package & artifact installation by Pkg should still be to the normal place. A completely separate tool should be used to copy content-addressed resources into a project to make it standalone, with options to control whether to copy lazy artifacts or not and whether to copy artifacts for all platforms or just some.

@DilumAluthge
Copy link
Member Author

Alright, I’ve updated the PR. We now search the project first before searching other locations.

When writing artifacts, we write them to the user depot.

@DilumAluthge
Copy link
Member Author

Just FYI, searching the project first makes the code significantly more complicated than if we search the project last. This is because currently a lot of places in Pkg hardcode the assumption that the first artifact search path is also the place we would write new artifacts to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Overrides.toml next to Project.toml/Manifest.toml
7 participants