Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp show --show -json: outs does not return full path for keys #7790

Closed
mattseddon opened this issue May 20, 2022 · 4 comments · Fixed by #7690
Closed

exp show --show -json: outs does not return full path for keys #7790

mattseddon opened this issue May 20, 2022 · 4 comments · Fixed by #7690
Assignees
Labels
A: experiments Related to dvc exp product: VSCode Integration with VSCode extension

Comments

@mattseddon
Copy link
Member

Bug Report

Description

exp show output returns incomplete paths for outs (and maybe deps).

Reproduce

Demo project structure is as follow:

stages:
  train:
    cmd: python train.py
    deps:
    - data/MNIST
    - train.py
    params:
    - params.yaml:
    outs:
    - model.pt:
        checkpoint: true
    metrics:
      - training_metrics.json:
          persist: true
          cache: false
    plots:
    - training_metrics
    - misclassified.jpg
    - predictions.json:
        template: confusion
        x: actual
        y: predicted
.
├── data
│   └── MNIST
│       ├── raw
│       │   ├── t10k-images-idx3-ubyte
│       │   ├── t10k-images-idx3-ubyte.gz
│       │   ├── t10k-labels-idx1-ubyte
│       │   ├── t10k-labels-idx1-ubyte.gz
│       │   ├── train-images-idx3-ubyte
│       │   ├── train-images-idx3-ubyte.gz
│       │   ├── train-labels-idx1-ubyte
│       │   └── train-labels-idx1-ubyte.gz
│       └── raw.dvc
├── dvc.lock
├── dvc.yaml
├── misclassified.jpg
├── model.pt
├── params.yaml
├── predictions.json
├── requirements.txt
├── train.py
├── training_metrics
│   └── scalars
│       ├── acc.tsv
│       └── loss.tsv
└── training_metrics.json

exp show --show-json returns

{
  "workspace": {
    "baseline": {
      "data": {
        "timestamp": null,
        "params": {
          "params.yaml": {
            "data": {
              "lr": 0.003,
              "weight_decay": 0,
              "epochs": 15
            }
          }
        },
        "deps": {
          "data/MNIST": {
            "hash": "0aed307494600d178fbdc0d000d6db38.dir",
            "size": 66544866,
            "nfiles": 10
          },
          "train.py": {
            "hash": "90f29a92c178927514c7f4d61a984a8a",
            "size": 4865,
            "nfiles": null
          }
        },
        "outs": {
          "model.pt": {
            "hash": "38126781764ca9fb04496ca2c2173056",
            "size": 439383,
            "nfiles": null
          },
          "raw": {
            "hash": "8c257df187855c681f88bde92d721ccd.dir",
            "size": 66544770,
            "nfiles": 8
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "training_metrics.json": {
            "data": {
              "step": 14,
              "loss": 0.9596208930015564,
              "acc": 0.7735
            }
          }
        }
      }
    }
  },
  "024aa30e3e0b226f101b8323eeaea7ebb6537316": {
    "baseline": {
      "data": {
        "timestamp": "2022-05-20T08:34:01",
        "params": {
          "params.yaml": {
            "data": {
              "lr": 0.003,
              "weight_decay": 0,
              "epochs": 15
            }
          }
        },
        "deps": {
          "data/MNIST": {
            "hash": "0aed307494600d178fbdc0d000d6db38.dir",
            "size": 66544866,
            "nfiles": 10
          },
          "train.py": {
            "hash": "90f29a92c178927514c7f4d61a984a8a",
            "size": 4865,
            "nfiles": null
          }
        },
        "outs": {
          "model.pt": {
            "hash": "38126781764ca9fb04496ca2c2173056",
            "size": 439383,
            "nfiles": null
          },
          "raw": {
            "hash": "8c257df187855c681f88bde92d721ccd.dir",
            "size": 66544770,
            "nfiles": 8
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "training_metrics.json": {
            "data": {
              "step": 14,
              "loss": 0.9596208930015564,
              "acc": 0.7735
            }
          }
        },
        "name": "main"
      }
    }
  }
}

Notice the "raw": key in the outs output. I believe this should be data/MNIST/raw.

Expected

Full paths are returns for all deps/outs.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.9 on macOS-12.4-x86_64-i386-64bit
Supports:
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc (subdir), git

Additional Information (if any):

Please LMK if I have misunderstood something and this is expected. Thank you.

@mattseddon mattseddon added product: VSCode Integration with VSCode extension A: experiments Related to dvc exp labels May 20, 2022
@daavoo daavoo self-assigned this May 20, 2022
@daavoo
Copy link
Contributor

daavoo commented May 20, 2022

Hi @mattseddon . This is kind of expected because the keys in outs use the internal DVC property def_path.

We could change that, but, in #7690 I introduced a new fs_path property that will contain the absolute path to the artifact:

        "outs": {
          "model.pt": {
            "hash": "eeec0b39f3054c5a229a4bbbae47007d",
            "size": 439383,
            "nfiles": null,
            "fs_path": "/Users/daviddelaiglesiacastro/Desktop/iterative/vscode-dvc/demo/model.pt",
            "use_cache": true,
            "is_data_source": false
          },
          "raw": {
            "hash": "8c257df187855c681f88bde92d721ccd.dir",
            "size": 66544770,
            "nfiles": 8,
            "fs_path": "/Users/daviddelaiglesiacastro/Desktop/iterative/vscode-dvc/demo/data/MNIST/raw",
            "use_cache": true,
            "is_data_source": true
          }
        },

Would fs_path be enough for you?

@mattseddon
Copy link
Member Author

We are going to have to build the table headers and columns tree entries from this information. I.e these sections in the UI:

image

I think the path relative to the root would be more useful if we can get it but I need to think about it.

@pmrowla
Copy link
Contributor

pmrowla commented May 23, 2022

related #7575 (comment)

@daavoo
Copy link
Contributor

daavoo commented May 23, 2022

Should be fixed in #7690

daavoo added a commit that referenced this issue May 24, 2022
Use relpath to repo.root_dir as keys in deps and outs.
Add `use_cache` and `is_data_source` for outs.

This allows to differentiate git-tracked dependencies and
intermediate outputs.

Closes #7575
Closes #7790
daavoo added a commit that referenced this issue May 25, 2022
Use relpath to repo.root_dir as keys in deps and outs.
Add `use_cache` and `is_data_source` for outs.

This allows to differentiate git-tracked dependencies and
intermediate outputs.

Closes #7575
Closes #7790
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp product: VSCode Integration with VSCode extension
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants