GitHub - frostming/BentoML at e8ab6c426e5e34efb8d30f9430bb2dbbb2eb6035

7 Branches 0 Tags

Name	Name	Last commit message	Last commit date
Latest commit frostming fix: don't try to validate non-bytes as Path (bentoml#4765 ) May 31, 2024 e8ab6c4 · May 31, 2024 History 3,108 Commits
.devcontainer	.devcontainer	feat(ci): pdm and hatchling (bentoml#4056 )	Jul 20, 2023
.github	.github	chore(deps): bump pdm-project/setup-pdm from 3 to 4 (bentoml#4457 )	Feb 1, 2024
bazel	bazel	chore: cleanup bazel rules and dependencies (bentoml#3149 )	Oct 26, 2022
docs	docs	docs: Add byoc setup docs (bentoml#4757 )	May 27, 2024
examples	examples	docs: Update the examples folder readme (bentoml#4748 )	May 27, 2024
grpc-client	grpc-client	chore(deps): bump h2 from 0.3.20 to 0.3.24 in /grpc-client/rust (bent…	Jan 22, 2024
scripts	scripts	ci: Use nox as the task runner (bentoml#4432 )	Feb 1, 2024
src	src	fix: don't try to validate non-bytes as Path (bentoml#4765 )	May 31, 2024
tests	tests	fix: new sdk e2e test (bentoml#4758 )	May 27, 2024
tools	tools	feat(ci): pdm and hatchling (bentoml#4056 )	Jul 20, 2023
typings	typings	feat: tpu_type support (bentoml#4493 )	Feb 7, 2024
.bazelignore	.bazelignore	docs: gRPC advanced guides (bentoml#3034 )	Sep 27, 2022
.bazelrc	.bazelrc	style: use pre-commit to lint and format files (bentoml#3924 )	Jun 2, 2023
.gitattributes	.gitattributes	feat(ci): pdm and hatchling (bentoml#4056 )	Jul 20, 2023
.gitignore	.gitignore	feat: support .python-version symlink (bentoml#4354 )	Feb 1, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	fix: load model aliases before loading new SDK service (bentoml#4727 )	May 14, 2024
.python-version-default	.python-version-default	feat: support .python-version symlink (bentoml#4354 )	Feb 1, 2024
.readthedocs.yaml	.readthedocs.yaml	feat(ci): pdm and hatchling (bentoml#4056 )	Jul 20, 2023
.yamllint.yml	.yamllint.yml	feat: adding new issue templates format + `bentoml env` (bentoml#2689 )	Jul 13, 2022
BUILD.bazel	BUILD.bazel	feat(file): support custom mime type for file proto (bentoml#3095 )	Nov 16, 2022
CITATION.cff	CITATION.cff	chore: add BentoML citation (bentoml#4019 )	Jul 6, 2023
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	[Community] Update CODE_OF_CONDUCT.md (bentoml#788 )	Jun 12, 2020
CONTRIBUTING.md	CONTRIBUTING.md	style: use pre-commit to lint and format files (bentoml#3924 )	Jun 2, 2023
DEVELOPMENT.md	DEVELOPMENT.md	ci: Use nox as the task runner (bentoml#4432 )	Feb 1, 2024
GOVERNANCE.md	GOVERNANCE.md	style: use pre-commit to lint and format files (bentoml#3924 )	Jun 2, 2023
LICENSE	LICENSE	License notice file update (bentoml#481 )	Jan 20, 2020
Makefile	Makefile	refactor(cli): make CLI commands available as modules (bentoml#4487 )	Feb 23, 2024
README.md	README.md	docs: Update GitHub readme (bentoml#4749 )	May 21, 2024
SECURITY.md	SECURITY.md	style: use pre-commit to lint and format files (bentoml#3924 )	Jun 2, 2023
WORKSPACE	WORKSPACE	chore: cleanup bazel rules and dependencies (bentoml#3149 )	Oct 26, 2022
codecov.yml	codecov.yml	tests: rename tests and fix mlflow integration (bentoml#3108 )	Oct 20, 2022
noxfile.py	noxfile.py	fix: only inline display images (bentoml#4442 )	Feb 1, 2024
pdm.lock	pdm.lock	fix: config pollution from other services (bentoml#4715 )	May 8, 2024
pyproject.toml	pyproject.toml	fix: config pollution from other services (bentoml#4715 )	May 8, 2024

Repository files navigation

Unified Model Serving Framework

🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!

What is BentoML?

BentoML is an open-source model serving framework, simplifying how AI/ML models gets into production:

🍱 Easily build APIs for Any AI/ML Model. Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
🐳 Docker Containers made simple. No more dependency hell! Manage your environments, dependencies and models with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you run inference across different environments.
🧭 Maximize CPU/GPU utilization. Improve your API throughput and latency performance leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
👩‍💻 Build Custom AI Applications. BentoML is highly flexible for advanced customizations. Easily implement your own API specifications, asynchronous inference tasks; customize pre/post-processing, model inference logic; and define model composition; all using Python code. Supports any ML framework, modality, and inference runtime.
🚀 Build for Production. Develop, run and debug locally. Seamlessly deploy to production with Docker containers or BentoCloud.

Getting started

Install BentoML:

# Requires Python≥3.8
pip install bentoml torch transformers

Define APIs in a service.py file.

import bentoml
from transformers import pipeline
from typing import List

@bentoml.service
class Summarization:
    def __init__(self):
        self.pipeline = pipeline('summarization')

    @bentoml.api(batchable=True)
    def summarize(self, texts: List[str]) -> List[str]:
        results = self.pipeline(texts)
        return list(map(lambda res: res['summary_text'], results))

Run the service code locally (serving at http://localhost:3000 by default):

bentoml serve service.py:Summarization

Now you can run inference from your browser at http://localhost:3000 or with a Python script:

import bentoml

with bentoml.SyncHTTPClient('http://localhost:3000') as client:
    text_to_summarize: str = input("Enter text to summarize: ")
    summarized_text: str = client.summarize([text_to_summarize])[0]
    print(f"Summarized text: {summarized_text}")

Deploying your first Bento

To deploy your BentoML Service code, first create a bentofile.yaml file to define its dependencies and environments. Find the full list of bentofile options here.

service: "service:Summarization" # Entry service import path
include:
  - "*.py" # Include all .py files in current directory
python:
  packages: # Python dependencies to include
  - torch
  - transformers

Then, choose one of the following ways for deployment:

🐳 Docker Container

Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:

bentoml build

Ensure Docker is running. Generate a Docker container image for deployment:

bentoml containerize summarization:latest

Run the generated image:

docker run --rm -p 3000:3000 summarization:latest

☁️ BentoCloud

BentoCloud is the AI inference platform for fast moving AI teams. It lets you easily deploy your BentoML code in a fast-scaling infrastructure. Sign up for BentoCloud for personal access; for enterprise use cases, contact our team.

# After signup, follow login instructions upon API token creation:
bentoml cloud login --api-token <your-api-token>

# Deploy from current directory:
bentoml deploy .

For detailed explanations, read Quickstart.

Use cases

LLMs: Llama 3, Mixtral, Solar, Mistral, and more
Image Generation: Stable Diffusion, Stable Video Diffusion, Stable Diffusion XL Turbo, ControlNet, LCM LoRAs
Text Embeddings: SentenceTransformers
Audio: XTTS, WhisperX, Bark
Computer Vision: YOLO
Multimodal: BLIP, CLIP
Compound AI systems: Serving RAG with custom models

Check out the examples folder for more sample code and usage.

Advanced topics

See Documentation for more tutorials and guides.

Community

Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.

To report a bug or suggest a feature request, use GitHub Issues.

Contributing

There are many ways to contribute to the project:

Report bugs and "Thumbs up" on issues that are relevant to you.
Investigate issues and review other developers' pull requests.
Contribute code or documentation to the project by submitting a GitHub pull request.
Check out the Contributing Guide and Development Guide to learn more.
Share your feedback and discuss roadmap plans in the #bentoml-contributors channel here.

Thanks to all of our amazing contributors!

Usage tracking and feedback

The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting the environment variable:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Model Serving Framework

What is BentoML?

Getting started

Deploying your first Bento

Use cases

Advanced topics

Community

Contributing

Usage tracking and feedback

License

About

Releases

Packages

Languages

License

frostming/BentoML

Folders and files

Latest commit

History

Repository files navigation

Unified Model Serving Framework

What is BentoML?

Getting started

Deploying your first Bento

Use cases

Advanced topics

Community

Contributing

Usage tracking and feedback

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages