Skip to content

Commit

Permalink
Add Explanation for Fine-Tuning API
Browse files Browse the repository at this point in the history
Signed-off-by: Andrey Velichkevich <[email protected]>
  • Loading branch information
andreyvelich committed May 6, 2024
1 parent cf1d0b5 commit 7d30f12
Show file tree
Hide file tree
Showing 4 changed files with 73 additions and 36 deletions.
5 changes: 5 additions & 0 deletions content/en/docs/components/training/explanation/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
+++
title = "Explanation"
description = "Explanation for Training Operator Features"
weight = 60
+++
63 changes: 63 additions & 0 deletions content/en/docs/components/training/explanation/fine-tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
+++
title = "LLM Fine-Tuning with Training Operator"
description = "Why Training Operator needs fine-tuning API"
weight = 10
+++

{{% alert title="Warning" color="warning" %}}
This feature is in **alpha** stage and Kubeflow community is looking for your feedback. Please
share your experience using [#kubeflow-training-operator Slack channel](https://kubeflow.slack.com/archives/C985VJN9F)
or [Kubeflow Training Operator GitHib](https://github.com/kubeflow/training-operator/issues/new).
{{% /alert %}}

This page explains how [Training Operator fine-tuning API](/docs/components/training/user-guides/fine-tuning)
fits into Kubeflow ecosystem.

In the rapidly evolving landscape of machine learning (ML) and artificial intelligence (AI),
the ability to fine-tune pre-trained models represents a significant leap towards achieving custom
solutions with less effort and time. Fine-tuning allows practitioners to adapt large language models
(LLMs) like BERT or GPT to their specific needs by training these models on custom datasets.
This process maintains the model's architecture and learned parameters while making it more relevant
to particular applications. Whether you're working in natural language processing (NLP),
image classification, or another ML domain, fine-tuning can drastically improve performance and
applicability of pre-existing models to new datasets and problems.

## Why Training Operator Fine-Tune API Matter ?

Training Operator Python SDK introduction of Fine-Tune API is a game-changer for ML practitioners
operating within the Kubernetes ecosystem. Historically, Training Operator has streamlined the
orchestration of ML workloads on Kubernetes, making distributed training more accessible. However,
fine-tuning tasks often require extensive manual intervention, including the configuration of
training environments and the distribution of data across nodes. The Fine-Tune API aim to simplify
this process, offering an easy-to-use Python interface that abstracts away the complexity involved
in setting up and executing fine-tuning tasks on distributed systems.

## The Rationale Behind Kubeflow's Fine-Tune API

Implementing Fine-Tune API within Training Operator is a logical step in enhancing the platform's
capabilities. By providing this API, Training Operator not only simplifies the user experience for
ML practitioners but also leverages its existing infrastructure for distributed training.
This approach aligns with Kubeflow's mission to democratize distributed ML training, making it more
accessible and less cumbersome for users. The API facilitate a seamless transition from model
development to deployment, supporting the fine-tuning of LLMs on custom datasets without the need
for extensive manual setup or specialized knowledge of Kubernetes internals.

## Roles and Interests

Different user personas can benefit from this feature:

- **MLOps Engineers:** Can leverage this API to automate and streamline the setup and execution of
fine-tuning tasks, reducing operational overhead.

- **Data Scientists:** Can focus more on model experimentation and less on the logistical aspects of
distributed training, speeding up the iteration cycle.

- **Business Owners:** Can expect quicker turnaround times for tailored ML solutions, enabling faster
response to market needs or operational challenges.

- **Platform Engineers:** Can utilize this API to better operationalize the ML toolkit, ensuring
scalability and efficiency in managing ML workflows.

## Next Steps

- Understand [the architecture behind `train` API](/docs/components/training/reference/fine-tuning).
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Fine-Tuning with Training Operator"
title = "LLM Fine-Tuning with Training Operator"
description = "How Training Operator performs fine-tuning on Kubernetes"
weight = 10
+++
Expand Down
39 changes: 4 additions & 35 deletions content/en/docs/components/training/user-guides/fine-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,38 +10,11 @@ share your experience using [#kubeflow-training-operator Slack channel](https://
or [Kubeflow Training Operator GitHib](https://github.com/kubeflow/training-operator/issues/new).
{{% /alert %}}

In the rapidly evolving landscape of machine learning (ML) and artificial intelligence (AI),
the ability to fine-tune pre-trained models represents a significant leap towards achieving custom
solutions with less effort and time. Fine-tuning allows practitioners to adapt large language models
(LLMs) like BERT or GPT to their specific needs by training these models on custom datasets.
This process maintains the model's architecture and learned parameters while making it more relevant
to particular applications. Whether you're working in natural language processing (NLP),
image classification, or another ML domain, fine-tuning can drastically improve performance and
applicability of pre-existing models to new datasets and problems.
This page describes how to use a [`train` API from Training Python SDK](https://github.com/kubeflow/training-operator/blob/6ce4d57d699a76c3d043917bd0902c931f14080f/sdk/python/kubeflow/training/api/training_client.py#L112) that simplifies the ability to fine-tune LLMs with
distributed PyTorchJob workers.

## Why Training Operator Fine-Tune API Matter ?

Training Operator Python SDK introduction of Fine-Tune API is a game-changer for ML practitioners
operating within the Kubernetes ecosystem. Historically, Training Operator has streamlined the
orchestration of ML workloads on Kubernetes, making distributed training more accessible. However,
fine-tuning tasks often require extensive manual intervention, including the configuration of
training environments and the distribution of data across nodes. The Fine-Tune APIs aim to simplify
this process, offering an easy-to-use Python interface that abstracts away the complexity involved
in setting up and executing fine-tuning tasks on distributed systems.

Different user personas can benefit from this feature:

- **MLOps Engineers:** Can leverage these APIs to automate and streamline the setup and execution of
fine-tuning tasks, reducing operational overhead.

- **Data Scientists:** Can focus more on model experimentation and less on the logistical aspects of
distributed training, speeding up the iteration cycle.

- **Business Owners:** Can expect quicker turnaround times for tailored ML solutions, enabling faster
response to market needs or operational challenges.

- **Platform Engineers:** Can utilize these APIs to better operationalize the ML toolkit, ensuring
scalability and efficiency in managing ML workflows.
If you want to learn more about how the fine-tuning API fit in the Kubeflow ecosystem, head to
[explanation guide](/docs/components/training/explanation/fine-tuning).

## Prerequisites

Expand All @@ -50,10 +23,6 @@ to run this API.

## How to use Fine-Tuning API ?

[Training Operator Python SDK](/docs/components/training/installation/#installing-training-python-sdk)
implements a [`train` Python API](https://github.com/kubeflow/training-operator/blob/6ce4d57d699a76c3d043917bd0902c931f14080f/sdk/python/kubeflow/training/api/training_client.py#L112)
that simplifies the ability to fine-tune LLMs with distributed PyTorchJob workers.

You need to provide the following parameters to use the `train` API:

- Pre-trained model parameters.
Expand Down

0 comments on commit 7d30f12

Please sign in to comment.