-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logger: Add notifier to next_step
?
#90
Comments
@daavoo I'm trying to understand the motivation behind this? 😄 EDIT: do you have any references in other ml logger frameworks to this functionality? |
Sorry about the lack of clarity. The motivation comes from working with deep learning models that take a lot of time to train (i.e. hours or days). When working under that circumstances we always ended up writing some sort of "notification" code to complement or integrate into the ml logger. The main reason was to be able to monitor the train loop remotely (i.e. no need to look at the stdout in a terminal) This notification code takes care of sending a message to some platform (i.e. e-mail, slack / discord / telegram channel, etc) containing information like the number of finished epoch (a.k.a
I think that in other ml loggers we usually have an associated UI with a view that is automatically being updated as the plots/information are being logged (Related with this Studio issue: iterative/studio-support#13) In addition to that, some ml loggers also provide "notification" utilities:
Beyond existing functionality in other ml loggers, I have found different teams and open source communities solving this problem, including some I work/have worked with: |
I've just discovered another open-source tool focused on this kind of functionality: |
Related to #91 |
Another open source tool: |
Interesting integration between DagsHub and New Relic highlighting https://dagshub.com/blog/real-time-machine-learning-monitroing-new-relic-dagshub/ |
Related to #91 (comment), I think the most useful integration here would be making it dead simple to send full reports (similar to the html today) through supported channels. For example, the slack api could probably be used generate a message with the metrics and plot images, and similar for email (personally, I would prioritize slack because it's more collaborative and probably easier for users to set up). The local html generated now could just be one report/alert format in that case (and the cml markdown report another). |
That would be the way to go and the original idea using https://github.com/liiight/notifiers . For metrics is very feasible. However, the images / rendered plots would be kind of tricky because most channels don't have support to directly send images. We could rely on |
Rather than wrapping a general-purpose text-based notifier with support for many providers, it might be more useful to focus on providers in which we can send the entire report, including images/rendered plots. AFAIK this should be feasible without hosting in Slack (https://api.slack.com/methods/files.upload) and email (https://docs.python.org/3/library/email.examples.html). I'm not sure text-based alerts add enough value (we could instead have a doc or blog post showing how to use dvclive + https://github.com/liiight/notifiers). Full reports with plots seem like a more unique feature, and they extend dvclive's initial value prop of lightweight live monitoring for model training, providing serverless alerting and reporting anywhere without needing to access the training machine. Since a lot of training happens in headless environments anyway, this seems pretty useful to me. What do you think? |
I think it's useful and would be directly adding value for DVCLive. I'm a little "worried" about how easy would be to maintain because Report Providers sounds like integrations potentially growing perpendicular to ML Frameworks. So far, looking at slack and email APIs, it doesn't look that bad. |
@shcheklein mentioned that it might be worthwhile to look into RSS feed aggregators. There are some parallels in how RSS expects a particular schema of elements (https://validator.w3.org/feed/docs/rss2.html) and can publish them in a consistent format, so maybe it can give some ideas for how to implement. |
Sorry @casperdcl, I missed this comment. It's closer to the latter advanced usage. Probably channel, token, etc. can be set in environment variables, and the method can be something like |
I don't think we are likely to do this now that we have live metrics in Studio and other solutions exist for alerting. |
Depending on the type of model to be trained, the time in between calls to
next_step
may vary significantly. In common deep learning scenarios, i.e. the keras callback,next_step
is being called at the end of an epoch which could result in long times (maybe hours) in between calls.It could be useful to have built-in support for optionally sending a notification each time
next_step
is being called.Without changing
dvclive
, the user could just call a custom library (i.e. https://github.com/liiight/notifiers) after next_step:But having the notification step built inside
MetricLogger
would have some benefits like access to internals (i.e._metrics
) and configuration options in addition to hiding complexity to the end user.However, I'm not sure if it is worth to implement this feature inside
dvclive
or if it would be better to keepdvclive
as lightweight as possible.The text was updated successfully, but these errors were encountered: