Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show csv format of experiments #6468

Merged
merged 11 commits into from
Sep 8, 2021
74 changes: 53 additions & 21 deletions dvc/command/experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ def _collect_rows(
precision=DEFAULT_PRECISION,
sort_by=None,
sort_order=None,
fill_value=FILL_VALUE,
iso=False,
):
from dvc.scm.git import Git

Expand All @@ -161,8 +163,8 @@ def _collect_rows(
elif exp.get("queued"):
state = "Queued"
else:
state = FILL_VALUE
executor = exp.get("executor", FILL_VALUE)
state = fill_value
executor = exp.get("executor", fill_value)
is_baseline = rev == "baseline"

if is_baseline:
Expand Down Expand Up @@ -202,12 +204,12 @@ def _collect_rows(
exp_name,
name_rev,
typ,
_format_time(exp.get("timestamp")),
_format_time(exp.get("timestamp"), fill_value, iso),
parent,
state,
executor,
]
fill_value = FILL_VALUE_ERRORED if results.get("error") else FILL_VALUE
fill_value = FILL_VALUE_ERRORED if results.get("error") else fill_value
_extend_row(
row,
metric_names,
Expand Down Expand Up @@ -274,14 +276,18 @@ def _sort(item):
return ret


def _format_time(timestamp):
if timestamp is None:
return FILL_VALUE
if timestamp.date() == date.today():
def _format_time(datetime_obj, fill_value=FILL_VALUE, iso=False):
if datetime_obj is None:
return fill_value

if iso:
return datetime_obj.isoformat()

if datetime_obj.date() == date.today():
fmt = "%I:%M %p"
else:
fmt = "%b %d, %Y"
return timestamp.strftime(fmt)
return datetime_obj.strftime(fmt)


def _extend_row(row, names, items, precision, fill_value=FILL_VALUE):
Expand Down Expand Up @@ -327,6 +333,8 @@ def experiments_table(
sort_by=None,
sort_order=None,
precision=DEFAULT_PRECISION,
fill_value=FILL_VALUE,
iso=False,
) -> "TabularData":
from funcy import lconcat

Expand All @@ -342,7 +350,7 @@ def experiments_table(
"Executor",
]
td = TabularData(
lconcat(headers, metric_headers, param_headers), fill_value=FILL_VALUE
lconcat(headers, metric_headers, param_headers), fill_value=fill_value
)
for base_rev, experiments in all_experiments.items():
rows = _collect_rows(
Expand All @@ -353,6 +361,8 @@ def experiments_table(
sort_by=sort_by,
sort_order=sort_order,
precision=precision,
fill_value=fill_value,
iso=iso,
)
td.extend(rows)

Expand Down Expand Up @@ -385,7 +395,7 @@ def baseline_styler(typ):


def show_experiments(
all_experiments, pager=True, no_timestamp=False, **kwargs
all_experiments, pager=True, no_timestamp=False, show_csv=False, **kwargs
):
include_metrics = _parse_filter_list(kwargs.pop("include_metrics", []))
exclude_metrics = _parse_filter_list(kwargs.pop("exclude_metrics", []))
Expand All @@ -399,8 +409,13 @@ def show_experiments(
include_params=include_params,
exclude_params=exclude_params,
)
metric_headers = _normalize_headers(metric_names)
param_headers = _normalize_headers(param_names)

names = {**metric_names, **param_names}
counter = Counter(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for #5989
The duplicated column name foo in the test stage will cause a wrong output in test.

Copy link
Member

@skshetry skshetry Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, let's do the same for other headers that we have (like Experiments, rev, etc.) to reduce chances of collision. You can hoist the headers from the following and reuse them here:

headers = [
"Experiment",
"rev",
"typ",
"Created",
"parent",
"State",
"Executor",
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we rename the column typ to Type?

Copy link
Member

@skshetry skshetry Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the lowercase names, should we capitalize it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the lowercase names, should we capitalize it?

for the revs and parent I think we should capitalize them, but for the user-defined ones, capitalization might cause confusion to the users, and make them hard to manage in code ( It is easy to capitalize a string but hard to recover it )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, I was only talking about that particular list: {typ, rev, parent}.

Copy link
Contributor Author

@karajan1001 karajan1001 Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry
But one problem here, "rev" is consisitent with the some other functions for example in dvc/repo/plots/template.py, ./dvc/scm/git/__init__.py and ./dvc/api.py
while typ and parent are only used here. So I will only modify typ and parent here, still imperfect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they are not part of a UI, they are mostly part of an API or a schema where that makes sense. Please check dvc metrics show --all-commits for example.

name for path in names for name in names[path] for path in names
karajan1001 marked this conversation as resolved.
Show resolved Hide resolved
)
metric_headers = _normalize_headers(metric_names, counter)
param_headers = _normalize_headers(param_names, counter)

td = experiments_table(
all_experiments,
Expand All @@ -411,6 +426,8 @@ def show_experiments(
kwargs.get("sort_by"),
kwargs.get("sort_order"),
kwargs.get("precision"),
kwargs.get("fill_value"),
kwargs.get("iso"),
)

if no_timestamp:
Expand All @@ -422,9 +439,12 @@ def show_experiments(

row_styles = lmap(baseline_styler, td.column("typ"))

merge_headers = ["Experiment", "rev", "typ", "parent"]
td.column("Experiment")[:] = map(prepare_exp_id, td.as_dict(merge_headers))
td.drop(*merge_headers[1:])
if not show_csv:
merge_headers = ["Experiment", "rev", "typ", "parent"]
td.column("Experiment")[:] = map(
prepare_exp_id, td.as_dict(merge_headers)
)
td.drop(*merge_headers[1:])

headers = {"metrics": metric_headers, "params": param_headers}
styles = {
Expand Down Expand Up @@ -453,13 +473,11 @@ def show_experiments(
rich_table=True,
header_styles=styles,
row_styles=row_styles,
show_csv=show_csv,
)


def _normalize_headers(names):
count = Counter(
name for path in names for name in names[path] for path in names
)
def _normalize_headers(names, count):
return [
name if count[name] == 1 else f"{path}:{name}"
for path in names
Expand Down Expand Up @@ -493,6 +511,10 @@ def run(self):

ui.write(json.dumps(all_experiments, default=_format_json))
else:
precision = None if self.args.show_csv else self.args.precision
fill_value = "" if self.args.show_csv else FILL_VALUE
iso = True if self.args.show_csv else False

show_experiments(
all_experiments,
include_metrics=self.args.include_metrics,
Expand All @@ -502,8 +524,11 @@ def run(self):
no_timestamp=self.args.no_timestamp,
sort_by=self.args.sort_by,
sort_order=self.args.sort_order,
precision=self.args.precision or DEFAULT_PRECISION,
precision=precision,
fill_value=fill_value,
iso=iso,
pager=not self.args.no_pager,
show_csv=self.args.show_csv,
)
return 0

Expand Down Expand Up @@ -882,9 +907,16 @@ def add_parser(subparsers, parent_parser):
default=False,
help="Print output in JSON format instead of a human-readable table.",
)
experiments_show_parser.add_argument(
"--show-csv",
action="store_true",
default=False,
help="Print output in csv format instead of a human-readable table.",
)
experiments_show_parser.add_argument(
"--precision",
type=int,
default=DEFAULT_PRECISION,
help=(
"Round metrics/params to `n` digits precision after the decimal "
f"point. Rounds to {DEFAULT_PRECISION} digits by default."
Expand Down
5 changes: 4 additions & 1 deletion dvc/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,10 @@ def row_from_dict(self, d: Mapping[str, "CellT"]) -> None:
def render(self, **kwargs: Any):
from dvc.ui import ui

ui.table(self, headers=self.keys(), **kwargs)
if kwargs.pop("show_csv", False):
ui.write(self.to_csv(), end="")
else:
ui.table(self, headers=self.keys(), **kwargs)

def as_dict(
self, cols: Iterable[str] = None
Expand Down
73 changes: 69 additions & 4 deletions tests/func/experiments/test_show.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from dvc.main import main
from dvc.repo.experiments.base import EXPS_STASH, ExpRefInfo
from dvc.repo.experiments.executor.base import BaseExecutor, ExecutorInfo
from dvc.repo.experiments.utils import exp_refs_by_rev
from dvc.utils.fs import makedirs
from dvc.utils.serialize import YAMLFileCorruptedError, dump_yaml
from tests.func.test_repro_multistage import COPY_SCRIPT
Expand Down Expand Up @@ -269,6 +270,10 @@ def test_show_filter(
included,
excluded,
):
from contextlib import contextmanager
karajan1001 marked this conversation as resolved.
Show resolved Hide resolved

from dvc.ui import ui

capsys.readouterr()
div = "β”‚" if os.name == "nt" else "┃"

Expand Down Expand Up @@ -312,13 +317,30 @@ def test_show_filter(
if e_params is not None:
command.append(f"--exclude-params={e_params}")

assert main(command) == 0
@contextmanager
def console_with(console, width):
console_options = console.options
original = console_options.max_width
con_width = console._width

try:
console_options.max_width = width
console._width = width
yield
finally:
console_options.max_width = original
console._width = con_width

with console_with(ui.rich_console, 255):
assert main(command) == 0
cap = capsys.readouterr()

for i in included:
assert f"{div} {i} {div}" in cap.out
assert f"{div} params.yaml:{i} {div}" in cap.out
assert f"{div} metrics.yaml:{i} {div}" in cap.out
for e in excluded:
assert f"{div} {e} {div}" not in cap.out
assert f"{div} params.yaml:{e} {div}" not in cap.out
assert f"{div} metrics.yaml:{e} {div}" not in cap.out


def test_show_multiple_commits(tmp_dir, scm, dvc, exp_stage):
Expand Down Expand Up @@ -412,7 +434,6 @@ def test_show_running_checkpoint(
):
from dvc.repo.experiments.base import EXEC_BRANCH
from dvc.repo.experiments.executor.local import TempDirExecutor
from dvc.repo.experiments.utils import exp_refs_by_rev

baseline_rev = scm.get_rev()
dvc.experiments.run(
Expand Down Expand Up @@ -471,3 +492,47 @@ def test_show_with_broken_repo(tmp_dir, scm, dvc, exp_stage, caplog):

paths = ["workspace", "baseline", "error"]
assert isinstance(get_in(result, paths), YAMLFileCorruptedError)


def test_show_csv(tmp_dir, scm, dvc, exp_stage, capsys):
karajan1001 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

@skshetry skshetry Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this test? Can this test be replaced with a mocked test that checks if show_experiments is being called correctly? WDYT? I don't have strong opinion though.

Eg:

def test_experiments_show(dvc, scm, mocker):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, In my previous version, what I planned is that we only test _show_csv (you asked why I had this function) being called, the value of all_experiments (already tested properly), and the to_csv function. But I didn't test the code between all_experiments and to_csv in show_experiments in it. And for now as the show_experiments had been tested fully, we can just test the call from interface to the show_experiments.

baseline_rev = scm.get_rev()

def _get_rev_isotimestamp(rev):
return datetime.fromtimestamp(
scm.gitpython.repo.rev_parse(rev).committed_date
).isoformat()

result1 = dvc.experiments.run(exp_stage.addressing, params=["foo=2"])
rev1 = first(result1)
ref_info1 = first(exp_refs_by_rev(scm, rev1))
result2 = dvc.experiments.run(exp_stage.addressing, params=["foo=3"])
rev2 = first(result2)
ref_info2 = first(exp_refs_by_rev(scm, rev2))

capsys.readouterr()
assert main(["exp", "show", "--show-csv"]) == 0
cap = capsys.readouterr()
print(cap.out)
karajan1001 marked this conversation as resolved.
Show resolved Hide resolved
assert (
"Experiment,rev,typ,Created,parent,metrics.yaml:foo,params.yaml:foo"
in cap.out
)
assert ",workspace,baseline,,,3,3" in cap.out
assert (
"master,{},baseline,{},,1,1".format(
baseline_rev[:7], _get_rev_isotimestamp(baseline_rev)
)
in cap.out
)
assert (
"{},{},branch_base,{},,2,2".format(
ref_info1.name, rev1[:7], _get_rev_isotimestamp(rev1)
)
in cap.out
)
assert (
"{},{},branch_commit,{},,3,3".format(
ref_info2.name, rev2[:7], _get_rev_isotimestamp(rev2)
)
in cap.out
)