-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion for Neural Architecture Search with Reinforcement Learning #339
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
I signed it! |
d9b388c
to
a8709d0
Compare
CLAs look good, thanks! |
/assign @YujiOshima |
/assign @richardsliu @gaocegege |
/assign @johnugeorge |
pkg/suggestion/nasrl_service.py
Outdated
for w in worker_list: | ||
if w.Worker.status == api_pb2.COMPLETED: | ||
for ml in w.metrics_logs: | ||
if ml.name != self.objective_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if w.metrics_logs is like [objectivemetrics, othermetrics], line 214 will return othermetrics value (ml is othermetrics)
how about
if ml.name == self.objective_name:
self.logger.info("Evaluation result of previous candidate: {}".format(ml.values[-1].value))
return float(ml.values[-1].value)
and keep below to keep same indent with line 204 (completed_count can be removed)
self.logger.warning("Error. No trial has completed.")
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe w.metrics_logs will only return one selected metric defined in https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L12. The metrics collector will parse the pattern <metrics_name>: <metrics_value> from the log of training container and give it to suggestion.
Currently, the algorithm only spawns one trial each time and uses the metrics of the very last trial to update. That is why I will go through all the available trials and index the last one. In the future, the suggestion will be able to spawn multiple trials. Therefore, I think it is better to store all the metrics first and process the data afterwards. Currently the process is just taking the last one and return . In the future, it may become take the average and return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.metrics_logs will return Collection of https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L12 and https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L15. since only_latest_log=True, each of them only return last value, that is, ml.values[-1].value == ml.values[0].value
Maybe you can take a look by debug to verify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right! The program has been modified as suggested.
However, according to my testing run, writing like this causes the program to repeatedly print "Error. No trial has completed." So I removed return None and print error message and everything works fine. It is OK to not return, so that the GetEvaluationResult does not return anything and suggestion will not give any trials. And it is not en error if the status is not completed. There is no need to handle that.
pkg/suggestion/nasrl_service.py
Outdated
# However, if the user want to minimize the metrics, we can take the negative of the result | ||
if self.opt_direction == api_pb2.MINIMIZE: | ||
result = -result | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a mistake. Should delete the code above that. Already fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hougangliu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes: #338.
The first suggestion we are working on is reinforcement learning. The algorithm follows the paper Neural Architecture Search with Reinforcement Learning by Zoph & Le (https://arxiv.org/abs/1611.01578), and the implementation is based on the github of Efficient Neural Architecture Search via Parameter Sharing (https://github.com/melodyguan/enas).
Related to #257
This change is