Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp show csv/tsv option #5446

Closed
dberenbaum opened this issue Feb 10, 2021 · 5 comments · Fixed by #6468
Closed

exp show csv/tsv option #5446

dberenbaum opened this issue Feb 10, 2021 · 5 comments · Fixed by #6468
Assignees
Labels
A: experiments Related to dvc exp feature request Requesting a new feature p1-important Important, aka current backlog of things to do

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented Feb 10, 2021

See #5381 (comment) for context.

This is NOT a requirement for 2.0 release, but a nice-to-have that could be added after release.

Proposed format:

"Branch","Experiment","Revision","Queued","Checkpoint Tip","Checkpoint Parent","Created",...(params and metrics columns as they appear in CLI)
"workspace","baseline",,false,,,,...
"master","baseline","bcbee294fc6df6d096acdf1872a54e05818ce427",false,,,"2021-02-09T13:05:04",...
"master","exp-cd493","bcbee294fc6df6d096acdf1872a54e05818ce427",false,,,"2021-02-09T13:05:04",...
"master","named-exp","d4caebb5162878d16ea89ec125f88cb595a08491",false,,,"2021-02-10T11:18:00",...
"master","exp-deb9f","24f0f31b6daa46d57b7cd55d9238d55971b8a2b2",true,,,"2021-02-10T14:13:00",...
"checkpoints_branch","exp-c5edd","73b69e032634e69a0f96fb8f6c41d51485a9428d",false,"73b69e032634e69a0f96fb8f6c41d51485a9428d","0a7d8aa973c72bb88cb308b06273777260f21de6","2021-02-10T14:46:14",...
"checkpoints_branch","exp-c5edd","0a7d8aa973c72bb88cb308b06273777260f21de6",false,"73b69e032634e69a0f96fb8f6c41d51485a9428d","bcbee294fc6df6d096acdf1872a54e05818ce427","2021-02-10T14:46:13",...

Questions:

  • Do the columns names make sense?
  • Is any info missing?
  • Is "Checkpoint Tip" redundant if "Experiment ID" is included in each row? Would a checkpoint boolean column be more helpful (indicating whether or not this is a checkpoint)?

@pmrowla @dmpetrov

@dberenbaum dberenbaum added the A: experiments Related to dvc exp label Feb 10, 2021
@pmrowla pmrowla added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Feb 11, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Feb 11, 2021

The checkpoint tip column is redundant here, the json field is pretty specific to how we are rendering the table in the CLI & vscode extensions. (You can figure out the tip of a checkpoint branch by tracing the checkpoint parents field in reverse).

I'm also not sure if branch should be a column here vs using the git commit sha's, since you can have multiple branches/tags/etc pointing to a single git sha. We just choose one when we display the table in the CLI, but it might make more sense for us to have a git sha column and then a separate column for branch/tag names which point to that sha.

@dberenbaum
Copy link
Collaborator Author

We could have a "Branches" or "Refs" (to include tags) column like this:

"Experiment","Git Hash","Branches","Queued","Checkpoint Parent","Created",...(params and metrics columns as they appear in CLI)
"workspace",,"['master']",false,,,...
"baseline","bcbee294fc6df6d096acdf1872a54e05818ce427","['master']",false,,"2021-02-09T13:05:04",...
"exp-cd493","bcbee294fc6df6d096acdf1872a54e05818ce427","['master']",false,,"2021-02-09T13:05:04",...
"named-exp","d4caebb5162878d16ea89ec125f88cb595a08491","['master']",false,,"2021-02-10T11:18:00",...
"exp-deb9f","24f0f31b6daa46d57b7cd55d9238d55971b8a2b2","['master']",true,,"2021-02-10T14:13:00",...
"exp-c5edd","73b69e032634e69a0f96fb8f6c41d51485a9428d","['checkpoints_branch', 'master']",false,"0a7d8aa973c72bb88cb308b06273777260f21de6","2021-02-10T14:46:14",...
"exp-c5edd","0a7d8aa973c72bb88cb308b06273777260f21de6","['checkpoints_branch', 'master']",false,"bcbee294fc6df6d096acdf1872a54e05818ce427","2021-02-10T14:46:13",...

Or maybe it makes more sense to drop this column completely?

@dberenbaum
Copy link
Collaborator Author

Following up on this since it's related to #5980 (reply in thread) and other discussions around plotting metrics from multiple experiments.

This issue is not specifically about csv/tsv format, but instead about ease of parsing data from exp show. One option is to clean up the existing --show-json output, but I wonder if it's acceptable to keep the true tree structure from the Git history that's displayed now in --show-json or if we need a format that is a 2D table and looks like the one in the terminal? We might need both.

@robSanders818
Copy link

I think it'd be really useful to include a to_csv option. Personally I just find the dvc exp show table impossible to read. I have a lot of metrics that are being tracked, and some of them have longer names/fields, so the table gets cut off. I can't scroll right on the table to view columns, so this would give the basic functionality of seeing every column.

In addition I plan on running my pipeline lots of times, and the table this command shows is perfect, it would just be nice to store a lot of iterations of the pipeline in a csv file. This would enable the data to be easily viewed in excel, and manipulated in that environment which would be a really helpful functionality! Thanks!

@dberenbaum dberenbaum added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Aug 12, 2021
@dberenbaum
Copy link
Collaborator Author

This should be easy to do if it's simply taking the existing table that shows in the terminal and putting it into csv format (instead of designing a specific csv format like the comments above).

karajan1001 added a commit that referenced this issue Sep 8, 2021
* Show csv format of experiments

fix #5989 and #5446 

1. add --show-csv to dvc exp show
2. add a new functional test for csv format
3. add a new unit test for show_experiements
4. solve name duplication in table common headers (Created,Parent, etc... ), metrics files and param files.

Co-authored-by: Peter Rowlands (변기호) <peter@pmrowla.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp feature request Requesting a new feature p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants