Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log_param/log_params #292

Merged
merged 2 commits into from
Sep 23, 2022
Merged

add log_param/log_params #292

merged 2 commits into from
Sep 23, 2022

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Sep 12, 2022

Provides two new methods which log parameters to a yaml file (params.yaml):

  • log_param(name, value) logs the given parameter name/value
  • log_params(params) logs the given set of parameters (dict).

@dtrifiro dtrifiro force-pushed the add-log-param branch 3 times, most recently from 3c47348 to cb94d93 Compare September 12, 2022 13:21
@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2022

Codecov Report

Base: 94.81% // Head: 51.83% // Decreases project coverage by -42.97% ⚠️

Coverage data is based on head (cad9909) compared to base (9de1e71).
Patch coverage: 39.13% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #292       +/-   ##
===========================================
- Coverage   94.81%   51.83%   -42.98%     
===========================================
  Files          34       34               
  Lines        1523     1634      +111     
  Branches      166      183       +17     
===========================================
- Hits         1444      847      -597     
- Misses         58      782      +724     
+ Partials       21        5       -16     
Impacted Files Coverage Δ
src/dvclive/error.py 0.00% <0.00%> (-86.96%) ⬇️
src/dvclive/live.py 0.00% <0.00%> (-95.28%) ⬇️
src/dvclive/serialize.py 0.00% <0.00%> (ø)
tests/test_main.py 100.00% <100.00%> (ø)
src/dvclive/env.py 0.00% <0.00%> (-100.00%) ⬇️
src/dvclive/lgbm.py 0.00% <0.00%> (-100.00%) ⬇️
src/dvclive/keras.py 0.00% <0.00%> (-100.00%) ⬇️
src/dvclive/catalyst.py 0.00% <0.00%> (-100.00%) ⬇️
src/dvclive/huggingface.py 0.00% <0.00%> (-100.00%) ⬇️
src/dvclive/data/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@shcheklein
Copy link
Member

Should we log into params.yaml ? To simplify the migration and so that we can show them as params in Studio/VS Code?

Otherwise, what is the plan to support this in both - Studio/VS Code?

src/dvclive/live.py Outdated Show resolved Hide resolved
Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

A couple of comments. I have tried to differentiate implementation vs product comments.

Not sure if product discussions should happen here or in Notion cc @dberenbaum (I am Ok with discussing here)

Comment on lines 209 to 224
output_dir = Path(self.dir) / self.PARAMS_DIR
output_dir.mkdir(exist_ok=True)
params_path = output_dir / (
f"{name}.json" if not name.endswith(".json") else name
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Implementation)
Not a blocker but, for consistency, I think would be nice if we implement a new Data subclass (i.e. Param) and move this logic there.

It would look more similar to the other log_x methods and could launch explicit exceptions for invalid data types, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, and that's the first approach I went for, although if you look at the Data class most of its (abstract) methods are tightly connected to the "stepped" workflow: step, no_step_output_path, first_step_dump, no_step_dump, step_dump. Maybe have a new superclass from which Data and Param derive, although I think they would share only few methods/attributes.

I think a Param class is something we will have to add, especially in relation to the nested dictionaries/complex objects parameters support.

src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

Should we log into params.yaml ? To simplify the migration and so that we can show them as params in Studio/VS Code?

I'm not sure if it simplifies migration to put them into params.yaml, since it then requires understanding that you need to remove your log_param() calls or end up overwriting your param dependencies. Saving them in some other YAML file might be less confusing, although it looks a bit odd to be saving data in both YAML and JSON (no strong opinion there).

Otherwise, what is the plan to support this in both - Studio/VS Code?

This is still under discussion, but a custom params file could be supported through dvc.yaml config or through some DVCLive convention.

@dtrifiro
Copy link
Contributor Author

I'm not sure if it simplifies migration to put them into params.yaml, since it then requires understanding that you need to remove your log_param() calls or end up overwriting your param dependencies.

+1

Saving them in some other YAML file might be less confusing, although it looks a bit odd to be saving data in both YAML and JSON (no strong opinion there).

One reason I see for preferring YAML is to provide users with an example on how a "standard" params.yaml file would look like, although this does not provide a clear path on how to go forward from there: "ok, I have a params.yaml file which I could use as a params dependency, although I still have to rewrite my code in order to read/use that file"

@dtrifiro
Copy link
Contributor Author

Note: linting failures on mac/win 3.10 seem unrelated

@dtrifiro dtrifiro marked this pull request as ready for review September 19, 2022 15:42
src/dvclive/live.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

Should we log into params.yaml ?

@dtrifiro Do you want to migrate to a YAML file? I actually don't think params.yaml is that problematic as long as it's inside the dvclive dir.

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Sep 20, 2022

@dtrifiro Do you want to migrate to a YAML file? I actually don't think params.yaml is that problematic as long as it's inside the dvclive dir.

@dberenbaum That sounds good. Although in order to remain consistent with dvc, I think we'd have to use the facilities included in dvc.utils.serialize to load/dump yaml files. I'd avoid requiring dvc as a dependency though. Maybe it's worth extracting dvc.utils.serialize into a separate package? What do you think?

@dtrifiro dtrifiro force-pushed the add-log-param branch 3 times, most recently from 2196bad to cad9909 Compare September 21, 2022 17:34
@dberenbaum
Copy link
Collaborator

Discussed with @dtrifiro and decided that we should be able to implement basic YAML parsing without copying all of the DVC YAML serialization utilities.

Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and good to go.

Main doubt:

(Product) I am missing the point of YAML vs JSON but I have no strong opinion. I just think the right direction would be to implement iterative/dvc#7939 and save us the new dependency / extra utils code.

Must address:

(Implementation) @dtrifiro Need to either make all imports of ruamel.yaml optional (with try/except ImporError or lazy import) or add as a install_requires dependency.

src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/live.py Outdated Show resolved Hide resolved
src/dvclive/serialize.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

(Product) I am missing the point of YAML vs JSON but I have no strong opinion. I just think the right direction would be to implement iterative/dvc#7939 and save us the new dependency / extra utils code.

@daavoo Not sure I understand your point (I also have no strong opinion on YAML vs JSON). How does a configurable params file save code/dependencies here?

@daavoo
Copy link
Contributor

daavoo commented Sep 23, 2022

@daavoo Not sure I understand your point (I also have no strong opinion on YAML vs JSON). How does a configurable params file save code/dependencies here?

If I understood the motivation for params.yaml was to simplify the migration, right?
If we allow DVC to set custom default params file, migration would be equally simple with json format

@dtrifiro dtrifiro force-pushed the add-log-param branch 3 times, most recently from c90f723 to 15805f0 Compare September 23, 2022 11:09
@dtrifiro
Copy link
Contributor Author

dtrifiro commented Sep 23, 2022

Addressed the remaining comments.

If we allow DVC to set custom default params file

Do we really need this? We can already specify custom params files, although we "nudge" users towards using params.yaml

@dberenbaum
Copy link
Collaborator

dberenbaum commented Sep 23, 2022

If I understood the motivation for params.yaml was to simplify the migration, right?
If we allow DVC to set custom default params file, migration would be equally simple with json format

Good point. I would say YAML is arguably more easily readable and editable and maybe more common for parameters, but JSON is not so different, has fewer issues, and is already used for metrics, so 🤷 . Let's merge this and we can come back to it if needed in the future.

Edit: @daavoo Not sure if you were thinking that a default params file would save DVC from needing to specify parameters outside of a pipeline. Even if params are written to the default file, dvc exp show and other params-related commands will not read these parameters unless they are included in a stage and added to dvc.lock, so I'm not sure a default file helps that much for "pipelineless" parameters.

@daavoo daavoo merged commit 167fe68 into iterative:main Sep 23, 2022
@dtrifiro dtrifiro deleted the add-log-param branch September 26, 2022 09:10
@daavoo daavoo linked an issue Sep 26, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

logger: extend with log_param
5 participants