Skip to content

Commit

Permalink
Add documentation for Models from Code (#12381)
Browse files Browse the repository at this point in the history
Signed-off-by: Ben Wilson <[email protected]>
Signed-off-by: Daniel Lok <[email protected]>
  • Loading branch information
BenWilson2 authored and daniellok-db committed Jun 20, 2024
1 parent 2e57698 commit e09f559
Show file tree
Hide file tree
Showing 4 changed files with 176 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 91 additions & 0 deletions docs/source/llms/langchain/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,11 @@ I can't load my chain!
I can't save my chain, agent, or retriever with MLflow.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. tip::

If you're encountering issues with logging or saving LangChain components with MLflow, see the `models from code <../../models.html#models-from-code>`_
feature documentation to determine if logging your model from a script file provides a simpler and more robust logging solution!

- **Serialization Challenges with Cloudpickle**: Serialization with cloudpickle can encounter limitations depending on the complexity of the objects.

Some objects, especially those with intricate internal states or dependencies on external system resources, are not inherently pickleable. This limitation
Expand Down Expand Up @@ -240,3 +245,89 @@ How can I use a streaming API with LangChain?
As of the MLflow 2.12.2 release, LangChain models that support streaming responses that have been saved using MLflow 2.12.2 (or higher) can be loaded and used for
streamable inference using the ``predict_stream`` API. Ensure that you are consuming the return type correctly, as the return from these models is a ``Generator`` object.
To learn more, refer to the `predict_stream guide <https://mlflow.org/docs/latest/models.html#how-to-load-and-score-python-function-models>`_.

How can I log my chain from code?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Models from Code**: MLflow 2.12.2 introduced the ability to log LangChain models directly from a code definition.

In order to use this feature, you will utilize the :py:func:`mlflow.models.set_model` API to define the chain that you would like to log as an MLflow model.
After having this set within your code that defines your chain, when logging your model, you will specify the **path** to the file that defines your chain.

For example, here is a simple chain defined in a file named ``langchain_code_chain.py``:

.. code-block:: python
import os
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import OpenAI
import mlflow
mlflow.set_experiment("Homework Helper")
mlflow.langchain.autolog()
prompt = PromptTemplate(
template="You are a helpful tutor that evaluates my homework assignments and provides suggestions on areas for me to study further."
" Here is the question: {question} and my answer which I got wrong: {answer}",
input_variables=["question", "answer"],
)
model = OpenAI(temperature=0.95)
chain = (
{
"question": itemgetter("messages") | RunnableLambda(get_question),
"answer": itemgetter("messages") | RunnableLambda(get_answer),
"chat_history": itemgetter("messages") | RunnableLambda(extract_chat_history),
}
| prompt
| model
| StrOutputParser()
)
mlflow.models.set_model(chain)
From a different file (in this case, a Jupyter Notebook), logging the model directly via supplying the path to the file that defines the chain:

.. code-block:: python
from pprint import pprint
import mlflow
chain_path = "langchain_code_chain.py"
with mlflow.start_run():
info = mlflow.langchain.log_model(lc_model=chain_path, artifact_path="chain")
# Load the model and run inference
homework_chain = mlflow.langchain.load_model(model_uri=info.model_uri)
exam_question = {
"messages": [
{
"role": "user",
"content": {
"question": "What is the primary function of control rods in a nuclear reactor?",
"answer": "To stir the primary coolant so that the neutrons are mixed well.",
},
},
]
}
response = homework_chain.invoke(exam_question)
pprint(response)
The model will be logged as a script within the MLflow UI:

.. figure:: ../../_static/images/tutorials/llms/langchain-code-model.png
:alt: Logging a LangChain model from a code script file
:width: 100%
:align: center
85 changes: 85 additions & 0 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ Each MLflow Model is a directory containing arbitrary files, together with an ``
file in the root of the directory that can define multiple *flavors* that the model can be viewed
in.

The **model** aspect of the MLflow Model can either be a serialized object (e.g., a pickled ``scikit-learn`` model)
or a Python script (or notebook, if running in Databricks) that contains the model instance that has been defined
with the :py:func:`mlflow.models.set_model` API.

Flavors are the key concept that makes MLflow Models powerful: they are a convention that deployment
tools can use to understand the model, which makes it possible to write tools that work with models
from any ML library without having to integrate each tool with each library. MLflow defines
Expand Down Expand Up @@ -147,6 +151,87 @@ class has four key functions:
* :py:func:`load <mlflow.models.Model.load>` to load a model from a local directory or
from an artifact in a previous run.


Models From Code
^^^^^^^^^^^^^^^^

.. note::
The Models from Code feature is available in MLflow versions 2.12.2 and later. This feature is experimental and may change in future releases.

The Models from Code feature allows you to define and log models directly from Python code. This feature is particularly useful when you want to
log models that can be effectively stored as a code representation (models that do not need optimized weights through training) or applications
that rely on external services (e.g., LangChain chains). Another benefit is that this approach entirely bypasses the use of the ``pickle`` or
``cloudpickle`` modules within Python, which can carry security risks when loading untrusted models.

.. note::
This feature is only supported for **LangChain** and **PythonModel** models.

In order to log a model from code, you can leverage the :py:func:`mlflow.models.set_model` API. This API allows you to define a model by specifying
an instance of the model class directly within the file where the model is defined. When logging such a model, a
file path is specified (instead of an object) that points to the Python file containing both the model class definition and the usage of the
``set_model`` API applied on an instance of your custom model.

The figure below provides a comparison of the standard model logging process and the Models from Code feature for models that are eligible to be
saved using the Models from Code feature:

.. figure:: _static/images/models/models_from_code.png
:alt: Models from Code
:width: 60%
:align: center

For example, defining a model in a separate file named ``my_model.py``:

.. code-block:: python
import mlflow
from mlflow.models import set_model
class MyModel(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input):
return model_input
# Define the custom PythonModel instance that will be used for inference
set_model(MyModel())
.. note::

The Models from code feature does not support capturing import statements that are from external file references. If you have dependencies that
are not captured via a ``pip`` install, dependencies will need to be included and resolved via appropriate absolute path import references from
using the `code_paths feature <https://mlflow.org/docs/latest/model/dependencies.html#saving-extra-code-with-an-mlflow-model-manual-declaration>`_.
For simplicity's sake, it is recommended to encapsulate all of your required local dependencies for a model defined from code within the same
python script file due to limitations around ``code_paths`` dependency pathing resolution.

.. tip::

When defining a model from code and using the :py:func:`mlflow.models.set_model` API, the code that is defined in the script that is being logged
will be executed internally to ensure that it is valid code. If you have connections to external services within your script (e.g. you are connecting
to a GenAI service within LangChain), be aware that you will incur a connection request to that service when the model is being logged.

Then, logging the model from the file path in a different python script:

.. code-block:: python
import mlflow
model_path = "my_model.py"
with mlflow.start_run():
model_info = mlflow.pyfunc.log_model(
python_model=model_path, # Define the model as the path to the Python file
artifact_path="my_model",
)
# Loading the model behaves exactly as if an instance of MyModel had been logged
my_model = mlflow.pyfunc.load_model(model_info.model_uri)
.. warning::
The :py:func:`mlflow.models.set_model` API is **not threadsafe**. Do not attempt to use this feature if you are logging models concurrently
from multiple threads. This fluent API utilizes a global active model state that has no consistency guarantees. If you are interested in threadsafe
logging APIs, please use the :py:class:`mlflow.client.MlflowClient` APIs for logging models.


.. _models_built-in-model-flavors:

Built-In Model Flavors
Expand Down

0 comments on commit e09f559

Please sign in to comment.