-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add py support for ParamsDependency #4456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aandrusenko, thanks for the PR. I haven't looked into parse_class()
yet, but could this be made more generic instead of always looking for Config
class?
dvc/utils/serialize/_py.py
Outdated
def parse_py_for_update(text, path): | ||
return parse_py(text, path) | ||
|
||
|
||
def _dump(data, stream): | ||
"""Save dict as .py file.""" | ||
return NotImplemented | ||
|
||
|
||
def dump_py(path, data, tree=None): | ||
return _dump_data(path, data, dumper=_dump, tree=tree) | ||
|
||
|
||
@contextmanager | ||
def modify_py(path, tree=None): | ||
with _modify_data(path, parse_py_for_update, dump_py, tree=tree) as d: | ||
yield d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to not implement, unless you plan on to support modify_*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to consult. How realistic is it to update the ".py" file from the dictionary. For example, something like this:
dict = {"foo": 1, "train": {"bar": 2}}
param = " {name} = {value}\n"
if dict in dict:
text = "class {class_name}:\n{params}\n".format(
class_name="Train",
params=[param.format(name=n, value=v) for n, v in dict["train"].items()]
)
text += "class Config:\n{params}\n".format(
params=[param.format(name=n, value=v) for n, v in dict.items()]
)
# save text to ".py" file
Even the simplest version looks very strange. I think this will be extremely difficult to maintain. Maybe you have some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take quite an effort to do correctly, especially with preserving comments, typed comments, etc. It'll need to use tokenize
and ast
(which btw does not support typing comments till 3.8, so we might need to use typed_ast
).
And, again using these, it's hard to do round-trip correctly, needs more time and effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, modify_*
is used for hyperparameter optimizations, which is an under-development feature. We can make without it as well for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to implement "modification" with AST now? We are working on a new feature, Experiments (working draft), which automatically modifies parameters based on hyper-parameter optimization, so it'd be great to have that if it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to work for the simpler cases we are targeting, it's actually pretty clever. But, I am worried about edge-cases as dump
is line-based (+ AST of course) whereas load
is pure-ast based, which might leave users confused on why something got modified vs some that didn't. Also, it does not work with nested values, eg:
class Config:
value = {"hello": "world"}
^ with Config.value.hello
modification, but I'm not sure if this cases should even be supported though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what options do we have here?
- Support
modify
and just accept the fact that users might get their params file modified if they usemodify
- that will be confusing, will result in noise in PR's and commits. - Don't support it - we need this feature for experiments, but for now we can just "Error: Use yaml or json please" our way out of this for now.
- Support modify and write our own
dump
-er - that seems like a lot of maintaining
I think out of those 3, the best is still 1 - cosmetic changes during dump will be present, but it can be simply avoided using black
in pre-commit, and we still preserve the functionality required for experiments.
As to value = {"hello": "world"}
case, I would forbid that, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry, so what should I do, leave it as it is now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like leaving this as-is for now will be best, I think having this partial support for modifying python params is better than nothing at all.
When experiments is ready for release we will just have to document that support for python params is limited to these simple cases, and that yaml/toml should be preferred if using the experiments features.
@skshetry, I changed the approach to parsing the file. Now all global variables and classes are converted to a parameter dictionary by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aandrusenko π
This looks great and works amazingly well.
Due to the nature of this PR, it might take a few days/deep-dives to get it merged. I hope that's okay.
Meanwhile, can you rebase?
Thanks again, this is a great and very helpful feature. π
@@ -0,0 +1,89 @@ | |||
import ast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not cross-python-version compatible, right? Not a blocker though.
46ba369
to
489c461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Most of the suggestions below are minor refactoring and a question about variables inside __init__
of the class.
dvc/utils/serialize/_py.py
Outdated
elif ( | ||
isinstance(_body, ast.FunctionDef) and _body.name == "__init__" | ||
): | ||
result.update(ast_tree_to_dict(_body)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this reads from variable instead of self.*
right?
class Klass:
def __init__(self):
self.a = 3
I'd expect Klass.a
be 3 here. If it's difficult, we could remove this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Klass.a
be 3. It's works. Added to tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I thought it didn't. Now, I see how it works, but might be confusing to the users, especially in the following scenarios:
1.
x = 3
config.x = 3
parse_py
gives me result {"x": 3}
.
2.
class Config:
def __init__(self):
self.container.value = 5
self.value = 1
value = 4 # used for something else
parse_py
gives me result {"Config": {"value": "4"}}
, instead of expected "1" (neither 4 or 5 is expected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, added a flag to read only self parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aandrusenko, this problem still exists, right? See the other two scenarios with self.container.value
and container.x
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could either ignore these, or return a dict containing
{"Config": {"container": {"value": 5}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, reads only self.value
without any implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this still fails for non-class attr assignment, right?
>>> parse_py("x = 4; config.x = 3", "file name does not matter")
{'x': 3}
Or, in terms of class
>>> parse_py("""
... class Config:
... value = 3
... def __init__(self):
... container.value = 4
... """, "D")
{'Config': {'value': 4}}
Looks like the name extraction logic needs overhaul.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry, fixed, but I have no more ideas how to ignore self.x
outside the class, ast
has no more functionality. In my opinion, it is strange to meet self.x
outside the class, because self
is not defined and params.py cannot even be imported
dvc/utils/serialize/_py.py
Outdated
if isinstance(assign.value, ast.Dict): | ||
_dct = {} | ||
for key, val in zip(assign.value.keys, assign.value.values): | ||
_dct[get_ast_value(key)] = get_ast_value(val) | ||
result[name] = _dct | ||
elif isinstance(assign.value, ast.List): | ||
result[name] = [get_ast_value(val) for val in assign.value.elts] | ||
elif isinstance(assign.value, ast.Set): | ||
values = [get_ast_value(val) for val in assign.value.elts] | ||
result[name] = set(values) | ||
elif isinstance(assign.value, ast.Tuple): | ||
values = [get_ast_value(val) for val in assign.value.elts] | ||
result[name] = tuple(values) | ||
else: | ||
result[name] = get_ast_value(assign.value) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work with recursive data. Not a problem though, just a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant by that is:
lst = [1, 2, 3, 4, [1, 2, 3, 4]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference only. Not sure how dot-based access would even work here. Again, no need to implement.
if isinstance(value, ast.Num): | ||
result = value.n | ||
elif isinstance(value, ast.Str): | ||
result = value.s | ||
elif isinstance(value, ast.NameConstant): | ||
result = value.value | ||
else: | ||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ast.{Num/Str/NameConstant}
has been deprecated in Python3.8, but still works on 3.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as-is for now though.
Codecov Report
@@ Coverage Diff @@
## master #4456 +/- ##
==========================================
- Coverage 91.57% 91.41% -0.16%
==========================================
Files 186 185 -1
Lines 13184 13055 -129
==========================================
- Hits 12073 11934 -139
- Misses 1111 1121 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, great job @aandrusenko. π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @aandrusenko. Would you mind creating a docs PR on the https://github.com/iterative/dvc.org?
Awesome! I haven't read the whole conversation above but I have a Q: What happens if the user doesn't have Python installed and tries to use this feature? I know, it's a probably an edge case but it's possible to install DVC without Python with binary builds so... Just curious. Does it depend on the system libraries, or the ones that come with the DVC installation? |
@jorgeorpinel, it depends on the python the DVC was installed with, so it's tied to the same python version as well. |
@aandrusenko, can you rebase? There are merge conflicts. |
* add py support by ast
703fcef
to
9802cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aandrusenko ! π
@skshetry what about binary installers? Those don't require a pre-existing Python in the system. Will this work without a system Python in that case? (That's what I meant.) Thanks |
add py support for ParamsDependency
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
This PR adds support for parameters file in ".py" format. Example:
Where params.py looks something like this:
dvc.lock looks like this: