-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare to other ML e2e platforms #58
Comments
Our blog post at https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html has the overall motivation, though it might be good to write more direct comparisons on the website at some point. (I personally don't like having that because they invariably get out of date and then people are worried that the website compared against an older version of their platform.) In a nutshell, I think there are two main goals in MLflow that are different from several of the platforms you list:
Basically, we didn't find anything that supported the large-scale, multi-library experimentation and deployment workflow that we saw people wanting to do, so we decided to focus on that. In general though, MLflow should also be pretty complementary to many of the tools you listed. For example, you can deploy your jobs to Kubernetes using one of these but use MLflow Tracking to track experiments or MLflow Models as a format for deploying the model. MLflow's goal is mainly to let you manage the ML lifecycle regardless of which tools you use to train or run the model. |
outsider view, but agree with @mateiz , Polyaxon has some similarities but others above focus on completely different problems, not properly ml experimentation platforms for the ml develop lifecycle been following pretty much every experimentation framework for quite some time, actually using and evolving our own, pretty much changes the way you do ml, for the better :) IMO. still find mlflow work one of the best so far on the open source, was very refreshing to see this published and a lot of needs/ideas validated (grabbing some new ideas and contributing if possible). Some of the ideas we were actually already using on our platform (local storage, experiments as a group of runs but not blocking of comparing pretty much everything to anything, everything gets stored,minimum overhead) plus job queue/scale out/docker, notifications, & saving a huge amount of metadata on a huge scale (thousands up to millions of runs), currently targeting better UI (here mlflow is better :) ) some references for similar work or workflows: (must read, this is the feeling & inspiration ) this one, very recent: others: |
Glad you like it, Rui! We're still early on on so we'd love input on what to improve or what will make it easier to run. We've also tried to design MLflow in a fairly modular way, where you can pick up some pieces but not others in your own platform. |
@elgalu That's an ongoing issue that I see on github with many other libraries as well. For instance, I see that mlfflow is hightly influenced by sacred which was influenced by sumatra but it's a shame that ppl don't contribute to existing libraries. Even at this point I still find it hard to see the differences between mlflow vs sacred, for instance (not meant to be a criticism). Not only that, but some old libraries like sumatra still have features that I haven't seen in any of the new libraries being offered. |
Hi Mateiz, what a great reply! I'm deciding what stack to adopt for my current employer and I'm having a hard time figuring out if it's possible ( or if it makes sense too) to have TFX on the models side but adopt MLFlow to manage libraries, artifacts, lifecycle, etc. What are your thoughts on on this? Thanks in advance, |
Yup, it should be possible to do that. MLflow already supports saving and managing TensorFlow models, as well as automatic logging of metrics that you send to TensorBoard. Which other pieces of TFX are important to you? We might be able to add built in integrations if needed, or you can just use them alongside the MLflow APIs. |
Thanks a lot for your quick reply @mateiz! So basically the idea is to have a relatively common pipeline implemented. The main idea is to use it to a variety of applications (from tabular data to images or NLP). Since I'm building up the machine learning area in this company, we are still discussing with the stakeholders which use cases will be tackled first. However, I've been working on ML/DS for a while now and I know the importance of defining a pipeline to be able to reuse models, data prep, data validation and sharing. My only fear on integrating MLFlow and TFX is that there will be things that might go out of control in either one of the tools at some point. |
Master sync 05 12
First of all congratulations for releasing all this hard work to the public!
I went through the examples to see if I would be able to figure out how exactly does this project differentiates from others but only saw some minor technical differences.
Could you provide a summary on why did you decide to create a complete new ML pipeline instead of joining some of the other ongoing efforts?
The text was updated successfully, but these errors were encountered: