Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an UI ? #1074

Closed
seeb0h opened this issue Aug 30, 2018 · 26 comments
Closed

Is there an UI ? #1074

seeb0h opened this issue Aug 30, 2018 · 26 comments
Labels
question I have a question?

Comments

@seeb0h
Copy link

seeb0h commented Aug 30, 2018

Hi, I am discovering dvc, so excuse me if I missed an obvious answer.

dvc looks great to use with the cli.
Now, I am considering to build a web ui in order to list dvc entities (and trigger experiments runs on cloud resources). Is there a convenient way (like an API) to access to the entities ?

Thanks

@efiop
Copy link
Contributor

efiop commented Aug 30, 2018

Hi @seeb0h !

We do have a python API(dvc/project.py), but it is not stable, not documented and thus not yet ready to be relied on. We will definitely make it stable and release to the public in the future. The only API that is stable and for which we guarantee backward compatibility is CLI. If you only need to access .dvc files, you can do that easily by using any yaml parser of your choice.

The web UI is a very interesting idea, could you please elaborate on it? We have been thinking about creating something like that and would be interested to hear more about what you would like to see it look like.

Thanks,
Ruslan

@efiop efiop added question I have a question? awaiting response we are waiting for your reply, please respond! :) labels Sep 1, 2018
@elgalu
Copy link

elgalu commented Sep 4, 2018

By "web UI" @seeb0h probably means something like ModelDB's UI:
http://modeldb.csail.mit.edu:3000/projects

screen shot 2018-09-04 at 06 42 54

@AKuederle
Copy link

To expand on this question, It would be really cool if there would be a way to get specfic datasets stored with dvc without initializing the respective git repo. Often it is required to just share a specific version of your data with your client. I am thinking, it might be an interesting addition, if we could write a little webserver/webservice for self hosting, that given a specific data version just provides the dataset as a http download.

I would be interested on working on something like this (probably as a separate project)

@seeb0h
Copy link
Author

seeb0h commented Sep 6, 2018

I am asking myself in a very forward-looking way.

We are currently updating an ML R&D workflow and dvc is a good candidate for our versioning. We aim (in a more or less distant future) to automate part of the workflow through a web application. So, the presence of an API at dvc will become an important point.

There is many functions that could be done in a webUI.
Tracking metrics as @elgalu mentions. But also provide a convenient way to explore versionning and traceability of code, model and data.

@efiop efiop removed the awaiting response we are waiting for your reply, please respond! :) label Sep 6, 2018
@efiop
Copy link
Contributor

efiop commented Sep 6, 2018

Thank you all guys for the feedback!

@elgalu @seeb0h Thanks for the info! We definitely had similar thoughts on features that we wanted to see implemented in a WebUI on top of core dvc. We are also actively discussing it within our team and think that we will begin working on it by the end of the year. Btw, there is a mailing list in which we sometimes(no spam though) ask the community to answer some questions that help us improve dvc and we will definitely ask more detailed questions/polls there once we start working on the webUI, so please feel free to subscribe 🙂

@AKuederle

To expand on this question, It would be really cool if there would be a way to get specfic datasets stored with dvc without initializing the respective git repo. Often it is required to just share a specific version of your data with your client. I am thinking, it might be an interesting addition, if we could write a little webserver/webservice for self hosting, that given a specific data version just provides the dataset as a http download.

I would be interested on working on something like this (probably as a separate project)

Sorry, I'm not sure I follow. What you are describing sounds like any file sharing service(e.g. Dropbox, s3, etc) to me. Could you elaborate please?

Thanks,
Ruslan

@AKuederle
Copy link

Thank you for your response. What I basically meant, was that it would be nice to have a way to get the data, which corresponds from a specific project without installing git/dvc. Similar to downloading code from Github as .zip. Of course you could go right to the specific storage location in your AWS bucket, ssh-server, etc. but it would be nice to have a little webservice (with or without gui) that abstracts that in a way and simple terms is able to provide a simple file download given a specific commit hash of the corresponding git repository.
There would be some issues to be solve, e.g. potential credentials to access the data store, but I think it would be a really nice feature, that would allow dvc to be also used as a kind of "frontend" for a ML data archive.

I think this might still be a little confusing. I will try to expand on that and explain the big picture as soon as i have time.

@efiop
Copy link
Contributor

efiop commented Sep 10, 2018

@AKuederle Thank you for sharing your thoughts! This is indeed a very interesting idea! It would be a great addition to the WebUI mentioned above. I think it is definitely going to be implemented in the future.

Thanks,
Ruslan

@piccolbo
Copy link

You could use hug to simultaneously maintain a web and python API and a CLI. It seems to reduce the work of maintaining the three and keeping them in alignment, plus of course the bonus of being able to run dvc as a service. Just my 2c.

@efiop
Copy link
Contributor

efiop commented Sep 14, 2018

Hi @piccolbo !

Thank you for the tip! Hug looks very promising! We will be sure to take a closer look at it when developing WebUI.

Thanks,
Ruslan

@efiop efiop closed this as completed Apr 24, 2019
@elgalu
Copy link

elgalu commented Apr 25, 2019

Hi @efiop , since you closed this issue, does it mean Hug is the solution we should look at?

@efiop
Copy link
Contributor

efiop commented Apr 25, 2019

@elgalu I don't know, we haven't been looking at it. I just felt that the discussion is done, since webui is not yet on our radar these days. Looks like I was wrong 🙂 Reopening.

@efiop efiop reopened this Apr 25, 2019
@Amir-Abushanab
Copy link

@AKuederle sounds a lot like https://dagshub.com - you can click on the file in the pipeline view and download it from it's source

@AKuederle
Copy link

AKuederle commented May 14, 2019 via email

@shcheklein shcheklein changed the title Is there an API ? Is there an UI ? Jul 18, 2019
@gerardsimons
Copy link

gerardsimons commented Oct 8, 2019

Hi there! I just wanted to offer my feedback / needs that might possibly be feature requests.

I am working at a start-up and definitely see a need for a tool such as this. Especially the evaluation and comparison is interesting. But similar to the OP I I envision having a dashboard that tracks our team's progress through a web app.

I guess we could make something like this our selves if there is a Python client that we can use to grab the latest results. From what I gather from previous comments is that the Python client needs to become stable first, and I am sure that seems solvable!

But when that is said and done, I wonder if there are any plans for providing tooling for hosting and displaying the results in a web app or would you guys integrate with existing open-source tools to do this? Or would that be left to the user altogether?

@PeterFogh
Copy link
Contributor

@gerardsimons, In my team, we use MLflow tracking. A great web UI, with nice experiment comparison features. Saving ones scores and artefacts can simply be done in a DVC stage.

@shcheklein
Copy link
Member

@gerardsimons thanks! could you please summarize the requirements for the dashboard you have in mind? DVC provides the dvc metrics show command and it can cover some part of it I believe. I'm just wondering what else would be valuable to show via UI on top of DVC. Or should we include more stuff into DVC first to visualize it then. Your input can significantly help us prioritize this.

@gerardsimons
Copy link

gerardsimons commented Oct 8, 2019

@PeterFogh Thanks for the suggestion, that looks very interesting indeed! Any resources you can share to set DVC up like that?

@shcheklein : Thank you for the help. I will try my best to elucidate a bit. If I understand correctly I can create my own metric and output those with dvc metrics show in the terminal? That is already very useful. In my specific use case I want to compare object detections, so I would measure IoU at different confidence thresholds and have things like precision and recall visualised on the fly. Often AP is computed at set intervals and we could do a look-up in the metrics what would match the given settings (some kind of slider). It does mean that the UI would need a way to filter the metrics given certain settings I guess but I am not sure. Basic recall precision graphs are also useful when we have to decide based on our business / clients where on the graph we want to be.

Of course nothing beats a couple of good examples to show in my example a few bounding boxes and the image. I think this would be more difficult to do in the current system and is very specialised to the type of data (images in this case).

@dmpetrov
Copy link
Member

dmpetrov commented Feb 3, 2020

@gerardsimons thank you for the detailed explanation of your vision on the dashboard. We just started working on visualization in DVC that could be a part of the future dashboard.

I’d love to hear your opinions on DVC visualization:

  1. We need a tool to visualize dvc metrics. What tool or library would you prefer?
  2. We need to compare metrics (scalars like AUC as well as plots like ROC curve) from one commit/branch from another. What is the best way to generalize this?
  3. It would be great if you could provide some examples with the intervals.

@NyanSet
Copy link

NyanSet commented Feb 3, 2020

Isn't there a Python API yet?
Can't see the dvc/project.py mentioned above

@dmpetrov
Copy link
Member

dmpetrov commented Feb 3, 2020

@NyanSet public API in dvc/api.py. Also, you can potentially use Repo class from dvc with some risks. See #3278.

Unfortunately, no metrics or visualization API yet. This is what I'd like to discuss here 😃

@florianblume
Copy link

Similarly to @PeterFogh I want to use a combination of MLflow and dvc in my future ML projects - the two just seem predestined to be used together. So, logging experiment metrics and parameters would take place primarily in MLflow (although of course logging the metrics in dvc could be done, too). Tracking data and experiment artifacts then happens through dvc.

The option to visually edit the pipeline - as @Amir-Abushanab mentioned is possible in DAGsHub's online tool - would be super cool. I just don't want to host my ML projects on DAGsHub so something like the MLflow UI would be nice. Actually, I was thinking, maybe it's possible to integrate this functionality into the MLflow UI. When selecting an experiment, have a section where you can edit the pipeline. Then, from within MLflow you could launch a new experiment run - the run stage of dvc could expect a config file which MLflow writes the parameters to that the user has to specify before the run (the config file stored in dvc being the default config).

I hope my comment is not offensive/off-topic because I'm suggesting to incorporate this functionality into another tool. I just had this idea and wanted to share my thoughts to see what others think about it.

@deanp70
Copy link

deanp70 commented Mar 10, 2020

@florianblume May I ask why you’d prefer not to host the project on DAGsHub? Is it because it’s not hosted anywhere (need a desktop client?) or because it’s hosted on GitHub and editing the pipeline is not an option in mirrors? Maybe we can solve the issue.

Disclaimer: I’m one of the founders of DAGsHub

@florianblume
Copy link

@deanp70 actually I thought that you had to push data to your service (my misunderstanding, I'm new to dvc) and that a free user wouldn't be able to create a private repository (sounded like it on the plan's page). I'll give it a try - it definitely looks like a very powerful tool. I still think it would be nice to integrate dvc and MLflow just because they seem to complement each other so well. There's also an issue on the MLflow page related to this.

@deanp70
Copy link

deanp70 commented Mar 11, 2020

@florianblume I see, thanks for the clarification, it's really helpful. Please let me know if you have any feedback. When we built the experiment tracking into the platform we tried to imagine an ideal combination between DVC and MLflow, but you can always use all of them (DVC, MLflow and DAGsHub)

@dmpetrov
Copy link
Member

@florianblume just out of curiosity... have you considered tensorboard for graph instead of mlflow?

@florianblume
Copy link

@dmpetrov that's what I've been doing until now. I ran each experiment in a different folder, together with a copy of its config file (parameters, path to data sets, etc.), its events and logs and output by the network (of training and prediction). I then started tensorboard in a certain top-level folder to be able to compare experiments - this is referred to as the caveman way in this medium.com article. There are multiple drawbacks with this approach, the most annoying was that it was difficult to compare experiments properly. I had to create an Excel sheet where I entered the results of the runs (exporting by script into a csv was not an option because I needed a certain order in the table). Tensorboard has probably more visualization features compared to MLflow but I feel like MLflow would give me a much better overview over experiments and much better comparison features. That's why I'd go for a combination of DVC, MLflow and TensorBoard (which I have left out for simplicity in my earlier comments).

@efiop efiop closed this as completed May 3, 2021
@iterative iterative locked and limited conversation to collaborators May 3, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question I have a question?
Projects
None yet
Development

No branches or pull requests