Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add py support for ParamsDependency #4456

Merged
merged 2 commits into from
Sep 24, 2020

Conversation

aandrusenko
Copy link
Contributor

@aandrusenko aandrusenko commented Aug 24, 2020

  • add py support for ParamsDependency

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

This PR adds support for parameters file in ".py" format. Example:

dvc run -n train -d stages/train.py -p params.py:SEED,Train python stages/train.py

Where params.py looks something like this:

SEED = 1

class Train:
    lr = 0.0041
    epochs = 75
    layers = 9

dvc.lock looks like this:

train:
  cmd: python stages/train.py
  deps:
  - path: stages/train.py
    md5: 23be4307b23dcd740763d5fc67993f11
  params:
    params.py:
      SEED: 1
      Train:
        epochs: 75
        layers: 9
        lr: 0.0041

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aandrusenko, thanks for the PR. I haven't looked into parse_class() yet, but could this be made more generic instead of always looking for Config class?

tests/func/params/test_show.py Outdated Show resolved Hide resolved
Comment on lines 46 to 86
def parse_py_for_update(text, path):
return parse_py(text, path)


def _dump(data, stream):
"""Save dict as .py file."""
return NotImplemented


def dump_py(path, data, tree=None):
return _dump_data(path, data, dumper=_dump, tree=tree)


@contextmanager
def modify_py(path, tree=None):
with _modify_data(path, parse_py_for_update, dump_py, tree=tree) as d:
yield d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not implement, unless you plan on to support modify_*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to consult. How realistic is it to update the ".py" file from the dictionary. For example, something like this:

dict = {"foo": 1, "train": {"bar": 2}}

param = "    {name} = {value}\n"
if dict in dict:
    text = "class {class_name}:\n{params}\n".format(
        class_name="Train",
        params=[param.format(name=n, value=v) for n, v in dict["train"].items()]
    )

text += "class Config:\n{params}\n".format(
    params=[param.format(name=n, value=v) for n, v in dict.items()]
)

# save text to ".py" file

Even the simplest version looks very strange. I think this will be extremely difficult to maintain. Maybe you have some ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take quite an effort to do correctly, especially with preserving comments, typed comments, etc. It'll need to use tokenize and ast (which btw does not support typing comments till 3.8, so we might need to use typed_ast).

And, again using these, it's hard to do round-trip correctly, needs more time and effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, modify_* is used for hyperparameter optimizations, which is an under-development feature. We can make without it as well for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to implement "modification" with AST now? We are working on a new feature, Experiments (working draft), which automatically modifies parameters based on hyper-parameter optimization, so it'd be great to have that if it's possible.

Copy link
Member

@skshetry skshetry Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to work for the simpler cases we are targeting, it's actually pretty clever. But, I am worried about edge-cases as dump is line-based (+ AST of course) whereas load is pure-ast based, which might leave users confused on why something got modified vs some that didn't. Also, it does not work with nested values, eg:

class Config:
    value = {"hello": "world"}

^ with Config.value.hello modification, but I'm not sure if this cases should even be supported though.

cc @efiop @pared @pmrowla

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what options do we have here?

  1. Support modify and just accept the fact that users might get their params file modified if they use modify - that will be confusing, will result in noise in PR's and commits.
  2. Don't support it - we need this feature for experiments, but for now we can just "Error: Use yaml or json please" our way out of this for now.
  3. Support modify and write our own dump-er - that seems like a lot of maintaining

I think out of those 3, the best is still 1 - cosmetic changes during dump will be present, but it can be simply avoided using black in pre-commit, and we still preserve the functionality required for experiments.

As to value = {"hello": "world"} case, I would forbid that, at least for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry, so what should I do, leave it as it is now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like leaving this as-is for now will be best, I think having this partial support for modifying python params is better than nothing at all.

When experiments is ready for release we will just have to document that support for python params is limited to these simple cases, and that yaml/toml should be preferred if using the experiments features.

@aandrusenko
Copy link
Contributor Author

@skshetry, I changed the approach to parsing the file. Now all global variables and classes are converted to a parameter dictionary by ast module. Parameters are used with Config.foo.

@skshetry skshetry requested review from pared and pmrowla September 3, 2020 06:42
dvc/utils/serialize/_py.py Outdated Show resolved Hide resolved
Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aandrusenko πŸ™‚
This looks great and works amazingly well.
Due to the nature of this PR, it might take a few days/deep-dives to get it merged. I hope that's okay.

Meanwhile, can you rebase?
Thanks again, this is a great and very helpful feature. πŸŽ‰

@@ -0,0 +1,89 @@
import ast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not cross-python-version compatible, right? Not a blocker though.

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Most of the suggestions below are minor refactoring and a question about variables inside __init__ of the class.

dvc/utils/serialize/_py.py Outdated Show resolved Hide resolved
dvc/utils/serialize/_py.py Outdated Show resolved Hide resolved
elif (
isinstance(_body, ast.FunctionDef) and _body.name == "__init__"
):
result.update(ast_tree_to_dict(_body))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this reads from variable instead of self.* right?

class Klass:
    def __init__(self):
       self.a = 3

I'd expect Klass.a be 3 here. If it's difficult, we could remove this functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Klass.a be 3. It's works. Added to tests.

Copy link
Member

@skshetry skshetry Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I thought it didn't. Now, I see how it works, but might be confusing to the users, especially in the following scenarios:
1.

x = 3
config.x = 3

parse_py gives me result {"x": 3}.
2.

class Config:

   def __init__(self):
      self.container.value = 5
       self.value = 1
       value = 4  # used for something else

parse_py gives me result {"Config": {"value": "4"}}, instead of expected "1" (neither 4 or 5 is expected).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added a flag to read only self parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aandrusenko, this problem still exists, right? See the other two scenarios with self.container.value and container.x above.

Copy link
Member

@skshetry skshetry Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either ignore these, or return a dict containing

{"Config": {"container": {"value": 5}}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, reads only self.value without any implementations

Copy link
Member

@skshetry skshetry Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this still fails for non-class attr assignment, right?

>>> parse_py("x = 4; config.x = 3", "file name does not matter")
{'x': 3}

Or, in terms of class

>>> parse_py("""
   ... class Config:
   ...    value = 3
   ...    def __init__(self):
   ...        container.value = 4
   ... """, "D")
{'Config': {'value': 4}}

Looks like the name extraction logic needs overhaul.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry, fixed, but I have no more ideas how to ignore self.x outside the class, ast has no more functionality. In my opinion, it is strange to meet self.x outside the class, because self is not defined and params.py cannot even be imported

Comment on lines 58 to 159
if isinstance(assign.value, ast.Dict):
_dct = {}
for key, val in zip(assign.value.keys, assign.value.values):
_dct[get_ast_value(key)] = get_ast_value(val)
result[name] = _dct
elif isinstance(assign.value, ast.List):
result[name] = [get_ast_value(val) for val in assign.value.elts]
elif isinstance(assign.value, ast.Set):
values = [get_ast_value(val) for val in assign.value.elts]
result[name] = set(values)
elif isinstance(assign.value, ast.Tuple):
values = [get_ast_value(val) for val in assign.value.elts]
result[name] = tuple(values)
else:
result[name] = get_ast_value(assign.value)

return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work with recursive data. Not a problem though, just a note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant by that is:

lst = [1, 2, 3, 4, [1, 2, 3, 4]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference only. Not sure how dot-based access would even work here. Again, no need to implement.

Comment on lines +88 to +180
if isinstance(value, ast.Num):
result = value.n
elif isinstance(value, ast.Str):
result = value.s
elif isinstance(value, ast.NameConstant):
result = value.value
else:
raise ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ast.{Num/Str/NameConstant} has been deprecated in Python3.8, but still works on 3.8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as-is for now though.

dvc/utils/serialize/_py.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 4, 2020

Codecov Report

Merging #4456 into master will decrease coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4456      +/-   ##
==========================================
- Coverage   91.57%   91.41%   -0.16%     
==========================================
  Files         186      185       -1     
  Lines       13184    13055     -129     
==========================================
- Hits        12073    11934     -139     
- Misses       1111     1121      +10     
Impacted Files Coverage Ξ”
dvc/repo/metrics/show.py 86.48% <0.00%> (-10.89%) ⬇️
dvc/command/version.py 91.83% <0.00%> (-8.17%) ⬇️
dvc/tree/dvc.py 95.89% <0.00%> (-1.46%) ⬇️
dvc/cache/base.py 95.09% <0.00%> (-1.35%) ⬇️
dvc/stage/cache.py 93.23% <0.00%> (-1.33%) ⬇️
dvc/tree/repo.py 92.20% <0.00%> (-1.23%) ⬇️
dvc/repo/plots/__init__.py 92.53% <0.00%> (-0.85%) ⬇️
dvc/cache/local.py 99.07% <0.00%> (-0.46%) ⬇️
dvc/repo/experiments/__init__.py 87.74% <0.00%> (-0.44%) ⬇️
dvc/cache/__init__.py 90.42% <0.00%> (-0.40%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 46bc893...9802cbe. Read the comment docs.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, great job @aandrusenko. πŸ‘

@skshetry skshetry added the feature is a feature label Sep 7, 2020
@efiop efiop requested a review from skshetry September 12, 2020 00:34
Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @aandrusenko. Would you mind creating a docs PR on the https://github.com/iterative/dvc.org?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 23, 2020

Awesome! I haven't read the whole conversation above but I have a Q: What happens if the user doesn't have Python installed and tries to use this feature? I know, it's a probably an edge case but it's possible to install DVC without Python with binary builds so... Just curious.

Does it depend on the system libraries, or the ones that come with the DVC installation?

@skshetry
Copy link
Member

skshetry commented Sep 24, 2020

@jorgeorpinel, it depends on the python the DVC was installed with, so it's tied to the same python version as well.

@skshetry
Copy link
Member

@aandrusenko, can you rebase? There are merge conflicts.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aandrusenko ! πŸ™

@efiop efiop merged commit e928034 into iterative:master Sep 24, 2020
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 27, 2020

it depends on the python the DVC was installed with, so it's tied to the same python version

@skshetry what about binary installers? Those don't require a pre-existing Python in the system. Will this work without a system Python in that case? (That's what I meant.) Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants