Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing experiments #3077

Closed
3 tasks
dberenbaum opened this issue Jan 9, 2023 · 42 comments · Fixed by #3422
Closed
3 tasks

Sharing experiments #3077

dberenbaum opened this issue Jan 9, 2023 · 42 comments · Fixed by #3422
Assignees
Labels
A: experiments Area: experiments table webview and everything related discussion 📦 product Needs product input or is being actively worked on story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented Jan 9, 2023

Related to #2855, the extension can make it easier to share experiments. Let's discuss what's needed here?

My initial thoughts on what's needed:

  1. Show a comparison of all or a subset of experiments like what you see in the table and plots views, except that it's not stuck on your local machine.
  2. Merge or otherwise move forward with an experiment that you think is a keeper.

For 1, I think it makes sense to use Studio since it already has all this functionality. The extension can upload the params, metrics, and plots to Studio like dvclive is doing for live metrics (except for the "live" part). After selecting any number of experiments, there could be an option to post to Studio. The only user friction should be having a Studio token.

For 2, I think there are a lot of ways to do it in DVC already, so it's probably not as critical, but maybe VS Code can make it smoother. With one click, the extension could create a branch with the same name as the experiment, push that to GitHub, and show the URL to create a PR (like the git cli message Create a pull request on GitHub by visiting...). Regardless of the decided UX, it might be better to choose one and not overwhelm the user with options/choices here.


@shcheklein shcheklein added discussion 📦 product Needs product input or is being actively worked on A: experiments Area: experiments table webview and everything related labels Jan 10, 2023
@daavoo
Copy link
Contributor

daavoo commented Jan 14, 2023

With the current endpoint for live metrics, an existing experiment could be shared with 3 REST API calls:

  • start:
json={
    "type": "start",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "client": "vscode",  # I think `client` is just ignored by studio
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
}
  • data

Include here metrics, params, and plots (only linear plots are accepted by the API).

The API was designed for sending incremental updates of the plots on each step, but it would still work if the full data is sent and step is set to the latest:

json={
    "type": "data",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "step": 2,  
    "metrics": {"metrics.json": {"data": {"step": 2, "foo": 3}}},
    "params": {"params.yaml": {"fooparam": 1}},
    "plots": {"plots/foo.tsv": {"data": 
        [{"step": 0, "foo": 1.0}, {"step": 1, "foo": 2.0}, {"step": 2, "foo": 3.0}]}
    },
    "client": "vscode",
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
},
  • done:
json={
    "type": "done",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "client": "vscode",
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
}

@daavoo
Copy link
Contributor

daavoo commented Jan 14, 2023

Schema is defined in https://github.com/iterative/dvc-studio-client/blob/main/src/dvc_studio_client/schema.py

@mattseddon mattseddon self-assigned this Jan 31, 2023
@shcheklein shcheklein added the story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc label Jan 31, 2023
@mattseddon
Copy link
Member

@daavoo how/where does a user get the STUDIO_TOKEN?

@daavoo
Copy link
Contributor

daavoo commented Feb 3, 2023

@daavoo how/where does a user get the STUDIO_TOKEN?

From their profile in Studio UI: https://dvc.org/doc/studio/user-guide/projects-and-experiments/live-metrics-and-plots#set-up-an-access-token

@daavoo
Copy link
Contributor

daavoo commented Feb 3, 2023

@mattseddon To clarify, STUDIO_REPO_URL is not the URL that you see in Studio UI and the format described in the current docs is outdated per https://github.com/iterative/studio/issues/4801

In the Python client, we try to set STUDIO_REPO_URL automatically from: git ls-remote --get-url

@mattseddon
Copy link
Member

Sharing experiments from the extension to Studio

I can see from the docs that all that is needed to start live metrics to Studio is for the user to invoke exp run like this:

STUDIO_TOKEN=**** dvc exp run

@daavoo @dberenbaum what are the current plans for dvc-studio-client + DVC. I have some ideas/questions.

Authentication:

Is there any plan to have the DVC config support the STUDIO_TOKEN environment variable? This way users can simply save their token as an entry in a Git ignored .dvc/config.local and they won't have to bother with it again.

If the use of a token is supported in this way we could then add a CLI command which either:

  • prompts for the user's username/password for Studio and then fetches a token and saves it into their local config (or creates a new one if it doesn't exist).
  • or does the same thing but authenticates them through their browser.

Sharing experiments

Is there any plan to add functionality into exp push which will also push a completed experiment to Studio? Again if the DVC config supports a Studio token entry maybe this can be done by default and/or flag(s) can be added to make it happen.

The extension would be able to leverage the above functionality to effectively auth with Studio and push experiments without doing any chaining of commands/running custom code.

WDYT?

Note: If DVC starts supporting a STUDIO_TOKEN config value we would need to some flag(s) to exp run so that not all jobs are sent to Studio by default.

The obvious alternative to the above is for me to recreate the parts of dvc-studio-client mentioned by @daavoo here. Ideally, I don't think we should be supporting multi-language implementations of the same code. I would still have to build the auth flow and I think it should be replaced pretty quickly. IMO this feels like it would be a wasted effort. It would probably be better for someone to point me in the right direction(s) in the DVC codebase so that I can contribute there.

@dberenbaum could be a good idea for us to have a call to discuss this before the next cross-team meeting WDYT? I can be flexible to fit in with your TZ.

@daavoo
Copy link
Contributor

daavoo commented Feb 6, 2023

Is there any plan to have the DVC config support the STUDIO_TOKEN environment variable? This way users can simply save their token as an entry in a Git ignored .dvc/config.local and they won't have to bother with it again.

I don't have a strong opinion but my feeling is that there are already a lot of existing tools/ways to handle environment variables and users might already have a preferred one to handle the usage of frequent variables

@mattseddon
Copy link
Member

Ok, to get started I will build the capability within the extension and use a new VS Code config entry (dvc.studioToken) to store the required token. I'll post regular updates here to let everyone know where I'm up to. If anyone feels this is the wrong way to go then please LMK.

@dberenbaum
Copy link
Contributor Author

I need to follow up here with my thoughts/plans so far. I'll try to write something thorough by tomorrow.

@mattseddon
Copy link
Member

mattseddon commented Feb 7, 2023

I've thrown together a quick prototype for a very interim auth solution at #3235.

@dberenbaum
Copy link
Contributor Author

@mattseddon That looks really good as a starting point, although I think we do want to save the token in DVC as you suggested. I put a full proposal into https://github.com/iterative/studio/issues/5050. I'd suggest we discuss general product-facing questions there but maybe keep this or another issue open to discuss details that are only interesting to VS Code. WDYT?

@mattseddon
Copy link
Member

Demo of basic auth flow (it is rough):

Screen.Recording.2023-02-13.at.11.54.51.am.mov

I think this will be (more or less) good enough for a one-time action once I've ironed it out but we can iterate over time.

As discussed previously the token will move back into DVC somewhere. It would be good to expose an endpoint in Studio that validates the token without having to send any data other than the token itself and a command in DVC that checks whether or not Studio is correctly "connected". This would mean the extension would know exactly when and when not to show any details regarding "Connect to Studio". We could also avoid issues created by users getting "stuck" not having a valid token and not being able to update it.

@dberenbaum
Copy link
Contributor Author

@shcheklein Could Studio have a redirect so that one link would take you to either the token (if you are logged in) or the sign in page (if not)?

@mattseddon Can the connect screen provide a place to enter the token instead of having to take you back to the settings? Otherwise, LGTM as a first step.

@mattseddon
Copy link
Member

updated demo:

Screen.Recording.2023-02-14.at.1.28.14.pm.mov

@dberenbaum
Copy link
Contributor Author

Sorry @mattseddon, I missed the first time that you enter the token in the command palette. What's the difference in the updated demo? Regardless, I think it looks like a good enough start for now and we can refine later.

@mattseddon
Copy link
Member

Sorry @mattseddon, I missed the first time that you enter the token in the command palette. What's the difference in the updated demo?

We are now saving the token in VS Code's SecretStorage and the add/remove commands are exposed outside of the "welcome screen".

Regardless, I think it looks like a good enough start for now and we can refine later.

I am now going to knock out "Share to Studio" as quickly as possible.

@mattseddon
Copy link
Member

With the token in place sharing live metrics from the extension to Studio is seamless:

Screen.Recording.2023-02-15.at.11.58.58.am.mov

Do we want to add this as an option when the user has a token? "Run and Share", something like that? TBH I am not sure what value this adds to the local experience outside of allowing users to "work in the open". If all team members sent all experiments to Studio then everyone in that team would know exactly what experiments are being run and by who. Seems outside of the normal data science workflow but towards a best practice and better collaboration.

For the first iteration of this process, I am going to recreate parts of dvc-studio-client inside the extension. I do think that we should provide the option in exp push to push directly to Studio. Is this something that we are interested in? Giving users the ability to retro-actively share experiment results from the CLI? If it is then maybe diverting my effort to contributing that functionality inside DVC would be the best use of my time. WDYT?

@mattseddon
Copy link
Member

mattseddon commented Feb 15, 2023

Also found/ran into https://github.com/iterative/studio/issues/5009.

image

I think I could easily get bogged down here. For the time being/the first prototype, I will not send plot information.

Note: Sharing plot data outside of the happy path is definitely more tricky. E.g if a user changes a template/plot type locally for an experiment and then shares it with Studio what happens? Could we limit the types of plots sent to Studio to a few different basic plot types, do we have to send the contents of the dvc.yaml/templates to Studio with each experiment... 😢?

@shcheklein
Copy link
Member

I think we need a clear way to enable / disable sharing the experiments as people run them (live sharing). As we discussed:

  • It can be a toggle in the side panel
  • It can be a toggle in the settings panel
  • It can be toggle in the experiments table itself / or under the table

But it should be visible, clear. I don't think that action in the command palette is enough for this.

When we first collect the token we should probably show this toggle (and enable by default?), we should also introduce a section on the Settings page that we already have with the token and with this toggle.

In the DVCLive snippet we should show a way to enable sharing via code.

@dberenbaum
Copy link
Contributor Author

@shcheklein What's the user scenario you have in mind? I can imagine it could be useful if I have a long-running experiment and I or others need to check on it after I have closed my laptop, but I think that would be more of a niche scenario compared to something like training in CI where I have no other way to check on it easily. I want to make sure I understand what the goal is and whether it's driven by a particular user scenario or by a desire to show the feature.

@dberenbaum
Copy link
Contributor Author

Despite what I wrote above, I agree it makes sense as a toggle more than an action, since it does not need to be specific to each experiment. It's probably more of a general workflow preference.

@shcheklein
Copy link
Member

Yes, this primarily to expose the feature. But also, this is practical - I might run an experiment on a remote machine via SSH, or codespaces and want to share it still so that other people can track the progress. Or, let's say to compare it with something else that I have only in Studio, etc.

Since it's a low hanging fruit, I don't see any major concerns to enable this, but we can get more insights more usage at the end.

@mattseddon
Copy link
Member

mattseddon commented Mar 3, 2023

There are a couple of updates at #3387 & #3379.

Next steps (next week):

  1. Once Share New Experiments Live is enabled start the queue with the required environment variables to share live results from queued experiments directly to Studio (need both STUDIO_TOKEN and STUDIO_REPO_URL).
  2. Split into two options (Share New Workspace Experiments Live & Share New Queued Experiments Live). This is more for visibility than anything else.
  3. Expose Open Studio Settings in the command palette.

@dberenbaum
Copy link
Contributor Author

2. Split into two options (Share New Workspace Experiments Live & Share New Queued Experiments Live). This is more for visibility than anything else.

Sorry, I'm not following what you mean by "visibility" here or what this part is for. Otherwise, all makes sense to me, thanks!

@mattseddon
Copy link
Member

Sorry, I'm not following what you mean by "visibility" here or what this part is for. Otherwise, all makes sense to me, thanks!

The current dvc.studio.shareExperimentsLive option will become dvc.studio.shareWorkspaceExperimentsLive & dvc.studio.shareQueuedExperimentsLive and there will be two checkboxes on the settings page instead of one. Users will be able to send none, one or both types. Does that make sense?

@dberenbaum
Copy link
Contributor Author

I guess I was wondering more why we want to have two separate checkboxes?

@mattseddon
Copy link
Member

If you don't think it is necessary to give that level of control and/or that it won't provide value then I won't do the work 🙏🏻.

@dberenbaum
Copy link
Contributor Author

Up to @shcheklein. I just didn't see the motivation to have that granularity of control over live sharing.

@shcheklein
Copy link
Member

Yep, I also don't see the need for this for now. We can keep it simpler.

@omesser
Copy link

omesser commented Mar 6, 2023

I join the opinion that it's best to make this a simple user-facing feature of "live sharing experiments" (for everything). users will probably have the control they need toggling this on and off while running queues/workspace experiments. If this is used and more granularity is requested - we can always "complicate" this in the future 😄

@daavoo
Copy link
Contributor

daavoo commented Mar 7, 2023

For the record @mattseddon with the latest Studio release, you should now be able to only send done event

@mattseddon
Copy link
Member

#3422 will close this as all of the discussion/scoping is on the Studio side right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Area: experiments table webview and everything related discussion 📦 product Needs product input or is being actively worked on story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants