Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: optimize dumping to files #238

Closed
Tracked by #223
dberenbaum opened this issue Apr 6, 2022 · 5 comments
Closed
Tracked by #223

log: optimize dumping to files #238

dberenbaum opened this issue Apr 6, 2022 · 5 comments
Assignees
Labels
A: log_metric Area: `live.log_metric` A: summary Area: `live.make_summary`

Comments

@dberenbaum
Copy link
Collaborator

Seems odd to do this. If there are multiple things to log per iteration, you dump multiple times?

  • Seems better to dump only on next_step()
  • Could also make configurable default log(..., write_summary=False)
  • Could make next_step() internally use log("step", ..., write_summary=True)

I don't quite follow the wording in #145 but from what I can tell it seems identical to this issue (#232) so shouldn't have been closed?

Originally posted by @casperdcl in #232 (comment)

@dberenbaum dberenbaum changed the title optimize dumping to files log: optimize dumping to files Apr 6, 2022
@daavoo daavoo added A: log_metric Area: `live.log_metric` A: summary Area: `live.make_summary` labels Sep 24, 2022
@daavoo
Copy link
Contributor

daavoo commented Oct 27, 2022

We could take #338 one step further and remove implicit make_summary calls.

API would look a little polluted but I think it is still not too bad:

for step in range(2):
    live.log_metric("foo", 1)
    live.log_metric("bar", 2)
    live.make_summary()
    live.next_step()

I also think it makes it kind of easier to document/explain the API as now live.log is the only method that dumps to 2 different places

Considering callbacks as the main entry point would make the change more like an internal detail

@shcheklein
Copy link
Member

How about live.dump? (and autodump mode)?

Dumping on the next_step makes sense also.

@dberenbaum
Copy link
Collaborator Author

I don't see this actually causing problems now, so not sure it's worth trying to address if it's about optimization, but if it cleans up the API, I'm open to it.

@daavoo Your idea would mean either make_summary or some other method should be called in no-step scenarios, right? That might be okay, just want to clarify the tradeoff.

As an alternative, the idea of #145 is still not really implemented IMO. I expected it to look like:

  1. log_metric updates metrics.json, which is not really a summary but instead the output of log_metric (none of the other log methods generate step-specific output now).
  2. A new method like make_hist generates the tsv file, and this could be called during next_step.

That workflow would have the tradeoff that the last step might not get logged to the tsv file.

In either case, if we have a context manager and recommend that in 1.0, it can solve these tradeoffs if additional methods are called on exit.

@casperdcl
Copy link

casperdcl commented Oct 29, 2022

for me it's about (in order):

  1. functionality (e.g. resume interruptions)
  2. intuitive (and clean) API
  3. configurability (e.g. log every N steps)
  4. optimisation (e.g. don't write summary multiple times in 1 step)

@dberenbaum dberenbaum mentioned this issue Nov 1, 2022
13 tasks
@dberenbaum
Copy link
Collaborator Author

Discussed with @daavoo and decided that:

  1. It's fine to only call make_summary at the end of each step.
  2. We should still call it within next_step or have some method that can call this along with the methods removed in live.set_step: Remove make_checkpoint and make_report calls. #338 so that non-callback users don't need to manually call something like:
for step in range(2):
    live.log_metric("foo", 1)
    live.log_metric("bar", 2)
    live.make_summary()
    live.make_checkpoint()
    live.make_report()
    live.next_step()

@daavoo daavoo self-assigned this Nov 2, 2022
@daavoo daavoo added this to DVC Nov 2, 2022
@daavoo daavoo moved this to Backlog in DVC Nov 2, 2022
@daavoo daavoo moved this from Backlog to In Progress in DVC Nov 2, 2022
daavoo added a commit that referenced this issue Nov 3, 2022
`log_metric` now stores logged value in `live.summary` but doesn't call `make_summary` as before.

Call `make_summary` inside `live.end`.

Closes #238
Closes #232
daavoo added a commit that referenced this issue Nov 4, 2022
`log_metric` now stores logged value in `live.summary` but doesn't call `make_summary` as before.

Call `make_summary` inside `live.end`.

Closes #238
Closes #232
@daavoo daavoo closed this as completed Nov 4, 2022
Repository owner moved this from In Progress to Done in DVC Nov 4, 2022
daavoo added a commit that referenced this issue Nov 4, 2022
`log_metric` now stores logged value in `live.summary` but doesn't call `make_summary` as before.

Call `make_summary` inside `live.end`.

Closes #238
Closes #232
daavoo added a commit that referenced this issue Nov 4, 2022
`log_metric` now stores logged value in `live.summary` but doesn't call `make_summary` as before.

Call `make_summary` inside `live.end`.

Closes #238
Closes #232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: log_metric Area: `live.log_metric` A: summary Area: `live.make_summary`
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants