Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post_live_metrics: Add _post_in_chunks. #80

Merged
merged 2 commits into from
Sep 1, 2023
Merged

post_live_metrics: Add _post_in_chunks. #80

merged 2 commits into from
Sep 1, 2023

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Aug 21, 2023

  • Post metrics and parameters separately
  • Try to post as many plots as possible and stop before reaching 30MB.
  • Discard plots higher than 10MB.

Optimistically? expect metrics and parameters to not exceed 30MB.

@daavoo daavoo requested a review from amritghimire August 21, 2023 17:06
@daavoo daavoo self-assigned this Aug 21, 2023
@daavoo daavoo added the enhancement New feature or request label Aug 21, 2023
@dberenbaum
Copy link
Contributor

Any concerns about this spamming studio too much? Thinking of https://iterativeai.slack.com/archives/C01116VLT7D/p1692452930044229.

@daavoo
Copy link
Contributor Author

daavoo commented Aug 22, 2023

Any concerns about this spamming studio too much? Thinking of https://iterativeai.slack.com/archives/C01116VLT7D/p1692452930044229.

As I understood it, the problem in the link was not about too many requests from the client to the server.
The problem was (still is) in the backend querying the database in a very inefficient way, so a single request in a project with a lot of steps per linear plot caused / can trigger this (https://github.com/iterative/studio/issues/7256).

We still need to fix that but the change here mainly affects images, which are handled differently from linear plots (the source of the linked problem).

The only change is when there the size of the images exceeds the server limit per single request:

Without this P.R, we just fail to send data on each step and try to send it in the next one. Failing forever and the request keeps getting bigger.
With this P.R:, we are able to send the data at each step bypassing the server limit.

@daavoo daavoo marked this pull request as ready for review August 22, 2023 09:24
@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
tests/test_post_live_metrics.py 100.00%

📢 Thoughts on this report? Let us know!.

@daavoo daavoo marked this pull request as draft August 22, 2023 09:36
@daavoo daavoo requested a review from shcheklein August 22, 2023 11:05
@daavoo daavoo marked this pull request as ready for review August 22, 2023 12:50
@daavoo
Copy link
Contributor Author

daavoo commented Aug 23, 2023

@amritghimire / @shcheklein any concerns with merging this or do you think it needs to wait on a solution for https://github.com/iterative/studio/pull/7310 ?

As I commented, this should mainly affect images not linear plots

@shcheklein
Copy link
Member

I think it doesn't have to wait. The only thing that comes to my mind to check (besides the basic review, @amritghimire can you do that please?)- align the limits with Studio big_files check. See here for example - https://github.com/iterative/studio/pull/7299/files#diff-a67d283c73e8021405df0b1de3fb8dcf570563c21c70de5324cc2d8b661e561fR36

We skip 10Mb files on Studio (so they will disappear as we then parse an experiment).

Also, I now put a limit of 200 files per directory (not ideal, see the discussion and chime in the PR)- but as a safe guard. Here (or on the Studio side? or both?) we probably also want to limit the number of chunks, or number of files, etc.

src/dvc_studio_client/post_live_metrics.py Outdated Show resolved Hide resolved
@daavoo
Copy link
Contributor Author

daavoo commented Sep 1, 2023

I think it doesn't have to wait. The only thing that comes to my mind to check (besides the basic review, @amritghimire can you do that please?)- align the limits with Studio big_files check. See here for example - https://github.com/iterative/studio/pull/7299/files#diff-a67d283c73e8021405df0b1de3fb8dcf570563c21c70de5324cc2d8b661e561fR36

We skip 10Mb files on Studio (so they will disappear as we then parse an experiment).

Also, I now put a limit of 200 files per directory (not ideal, see the discussion and chime in the PR)- but as a safe guard. Here (or on the Studio side? or both?) we probably also want to limit the number of chunks, or number of files, etc.

Added both checks

@daavoo daavoo merged commit 8d15544 into main Sep 1, 2023
@daavoo daavoo deleted the chunk-plots branch September 1, 2023 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants