Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull: granular pull of a single file triggers full imported dataset download #6124

Closed
shcheklein opened this issue Jun 7, 2021 · 1 comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push product: VSCode Integration with VSCode extension

Comments

@shcheklein
Copy link
Member

Bug Report

Description

Looks like dvc pull dataset/file triggers the full dataset download if dataset is an import from another repo.

Reproduce

mkdir test-granular-pull
cd test-granular-pull
git init
dvc init
git add .
git commit -a -m "init dvc"
dvc import https://github.com/iterative/dataset-registry fashion-mnist/raw
git add raw.dvc .gitignore
git commit -a -m "import raw data"
rm -rf raw/*
rm -rf .dvc/cache
cd raw
dvc pull t10k-images-idx3-ubyte.gz

Produces 1 file modified and 4 files fetched and downloads all files.

Expected

I would expect to see 1 file modified and 1 file(s) fetched

Environment information

DVC version: 2.3.0+ed2a0b
---------------------------------
Platform: Python 3.8.9 on macOS-10.15.6-x86_64-i386-64bit
Supports: All remotes
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1
Caches: local
Remotes: local
Workspace directory: apfs on /dev/disk1s1
Repo: dvc, git
@pmrowla
Copy link
Contributor

pmrowla commented Apr 4, 2023

Resolved by #9246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push product: VSCode Integration with VSCode extension
Projects
None yet
Development

No branches or pull requests

4 participants