-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp run
: copy (certain) git-ignored files to tmp folder on --run-all
#5800
Comments
Hey @jdonzallaz per #5029 it's possible to include untracked files in tmp exp dirs by staging them before queueing the experiment. So I think you could try something like $ git add --force secret.txt
$ dvc exp run --queue
$ git remove secret.txt Let's see what Engineering thinks of this one 🙂 |
I don't think that workaround is viable here given that the request here is specifically regarding:
And if an experiment is run this way, those API keys and password will end up in git history |
Do we need a separate issue to specifically prevent this workaround? B/c if I thought about it someone else may figure it out and it can lead to some hard-to-predict security risks. |
There's not really anything we can do to prevent people from doing this, users can add whatever they want into git. I think anyone using |
Not without I keep coming back to the idea of not using Git as a proxy UI to DVC's behavior, but that's discussed is in #5801. BTW, from #5801 (comment) @pmrowla:
Feels like too much. Is it common to have something like |
And anything that is staged will be included in the experiment commit, so that it can be reproduced again later.
I don't think it would be listed as a dependency, but some user stages may still depend on that directory existing. For example, I could have a pipeline stage with the command set to |
@jdonzallaz For your particular use case, how important is it to have the credentials files as relative paths within your repo? Are there workarounds for you like using environment variables, absolute paths, or making paths relative to your home dir? This isn't meant to dismiss the issue but to gather some more context about when and why it's important to have files like this accessible as gitignored files within the repo itself. |
Any script used in |
Standard practice would only be to include And yes, standard practice would also be for users to gitignore a venv directory, the same way that they should be gitignoring |
Ah, I see this issue is tricky. I've also checked the other related issues. To answer your question @dberenbaum, there is no technical reason the credentials files could not be in another, absolute path. However feels like a hack and does not feel right to not have project-related files in the project folder, and we should add another config entry to tell where is the credential file. At the moment, the only possibility is to I feel that git-ignored files that are explicitly listed as stage dependencies (like my credentials files) should be copied to the temp folder (without relying on git). This either automatically or through a CLI option. |
That seems different.
Agree but I'm not sure what you're suggesting @pmrowla, how do we even detect things that *should* be gitignored? I don't know the answer but again, it feels like a separate Q. I'd open another issue but TBH I don't know there's much DVC can do about that. |
@jdonzallaz again please be aware that the files will end up in the Git history this way, even if you don't explicitly
Yes! It's something we're considering (#5816): to have way(s) to tell DVC which (if any) untracked files to include in the queued exp, but knowing that DVC won't ever put them in Git. DVC would just cache them locally. And when/it you |
My point was that we can't do this, and that is a reason why we should not allow blanket adding untracked files. We should make the user be explicit. |
@jorgeorpinel Ah you're right, I forgot the part about the files ending in the Git history. Does having the untracked file in the stage dependencies is being explicit ? |
@jdonzallaz Are you collaborating with others on your project? Are these personal or shared credentials? I'm wondering how this would work on a team where everyone has personal credentials as dependencies, since it seems everyone will have to re-run each pipeline since the credentials will differ for everyone. @pmrowla @jorgeorpinel Would this be solved by being able to cache data locally only (data gets ignored by dvc push and pull)? It seems like that would enable users to cache sensitive data. We have some other issues open related to fine-grained control of file-level permissions and remotes, so maybe this is a use case for those features. |
Yep, I think that's the idea that's forming in #5816 (comment). |
On second thought, I don't think a local cache would even be the right solution for credentials, because if they change, you probably want to use the updated version and not the cached version. |
I'm not collaborating with other on this project. We have another project with DVC and collaboration, but the credentials (for the remote) are in the config.local. In this project, the credentials are (can be) shared, as they are specific to the project (and not the user). |
exp run
: copy (certain) git-ignored files to tmp folder on --run-all
Hi! Another idea from #1416 (comment) is to setup some mechanism (e.g. env vars) so that queued experiments can use certain assets from the original workspace directly (without moving them to the tmp dir or committing them to the exp commit). |
Hello, when running experiments in parallel, I am finding that local cache settings are ignored. I am using dvc 2.9.2. when I am in my repo:
but when I cd to one of the temporary experiment directories:
I mentioned this on Discord and was told that this could be relevant to this issue. |
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
UPDATE: Jump to #5800 (comment)
When running experiments with --run-all (with some experiments in the --queue), the repo is copied to the .dvc/tmp/ for each experiment to run. The problem is that the git-ignored files are not copied to this folder, and the commands that depend on those files fail.
The files that are git-ignored but are defined as stage dependencies should be copied.
Use-case is the following: Some config files containing sensible data (API keys, password) are not pushed to the git repo but are still needed by certain commands.
See discord discussion.
CC: @shcheklein @pmrowla
The text was updated successfully, but these errors were encountered: