From ead4e6042c9195ae5b2953449953c6e46ff7af90 Mon Sep 17 00:00:00 2001 From: themantalope Date: Tue, 4 Apr 2023 02:24:16 +0000 Subject: [PATCH] updated readme --- README.md | 130 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 68 insertions(+), 62 deletions(-) diff --git a/README.md b/README.md index f3f1292d..b308ccb4 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,6 @@


-

PyPI @@ -22,15 +21,15 @@ CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions. -⚡ **Fast**: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. +⚡ **Fast**: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. 🫐 **Elastic**: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing. -🐥 **Easy-to-use**: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding. +🐥 **Easy-to-use**: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding. 👒 **Modern**: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression. -🍱 **Integration**: Smooth integration with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solutions in no time. +🍱 **Integration**: Smooth integration with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solutions in no time. [*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090. @@ -39,17 +38,16 @@ CLIP-as-service is a low-latency high-scalability service for embedding images a ## Try it! An always-online server `api.clip.jina.ai` loaded with `ViT-L-14-336::openai` is there for you to play & test. -Before you start, make sure you have obtained a personal access token from the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens), +Before you start, make sure you have obtained a personal access token from the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens), or via CLI as described in [this guide](https://docs.jina.ai/jina-ai-cloud/login/#create-a-new-pat): -```bash +```bash jina auth token create -e ``` Then, you need to configure the access token in the parameter `credential` of the client in python or set it in the HTTP request header `Authorization` as ``. -⚠️ Our demo server `demo-cas.jina.ai` is sunset and no longer available after **15th of Sept 2022**. - +⚠️ Our demo server `demo-cas.jina.ai` is sunset and no longer available after **15th of Sept 2022**. ### Text & image embedding @@ -66,10 +64,10 @@ curl \ -X POST https://api.clip.jina.ai:8443/post \ -H 'Content-Type: application/json' \ -H 'Authorization: ' \ --d '{"data":[{"text": "First do it"}, - {"text": "then do it right"}, - {"text": "then do it better"}, - {"uri": "https://picsum.photos/200"}], +-d '{"data":[{"text": "First do it"}, + {"text": "then do it right"}, + {"text": "then do it better"}, + {"uri": "https://picsum.photos/200"}], "execEndpoint":"/"}' ``` @@ -94,6 +92,7 @@ r = c.encode( ) print(r) ``` + @@ -160,6 +159,7 @@ curl \ ``` gives: + ``` "the blue car is on the left, the red car is on the right" 0.5232442617416382 @@ -174,7 +174,6 @@ gives: - @@ -198,6 +197,7 @@ curl \ ``` gives: + ``` "this is a photo of three berries" 0.48507222533226013 @@ -216,15 +216,13 @@ gives: - - ## [Documentation](https://clip-as-service.jina.ai) ## Install -CLIP-as-service consists of two Python packages `clip-server` and `clip-client` that can be installed _independently_. Both require Python 3.7+. +CLIP-as-service consists of two Python packages `clip-server` and `clip-client` that can be installed _independently_. Both require Python 3.7+. ### Install server @@ -252,9 +250,10 @@ pip install "clip-server[onnx]" ```bash -pip install nvidia-pyindex +pip install nvidia-pyindex pip install "clip-server[tensorrt]" ``` + @@ -271,7 +270,6 @@ pip install clip-client You can run a simple connectivity check after install. - @@ -282,12 +280,12 @@ You can run a simple connectivity check after install. - -
C/S Server + ```bash python -m clip_server ``` - + @@ -299,7 +297,7 @@ python -m clip_server Client + ```python from clip_client import Client @@ -307,7 +305,7 @@ from clip_client import Client c = Client('grpc://0.0.0.0:23456') c.profile() ``` - + @@ -317,9 +315,7 @@ c.profile()
- -You can change `0.0.0.0` to the intranet or public IP address to test the connectivity over private and public network. - +You can change `0.0.0.0` to the intranet or public IP address to test the connectivity over private and public network. ## Get Started @@ -327,25 +323,30 @@ You can change `0.0.0.0` to the intranet or public IP address to test the connec 1. Start the server: `python -m clip_server`. Remember its address and port. 2. Create a client: + ```python from clip_client import Client - + c = Client('grpc://0.0.0.0:51000') - ``` + ``` + 3. To get sentence embedding: - ```python - r = c.encode(['First do it', 'then do it right', 'then do it better']) - - print(r.shape) # [3, 512] - ``` + + ```python + r = c.encode(['First do it', 'then do it right', 'then do it better']) + + print(r.shape) # [3, 512] + ``` + 4. To get image embedding: - ```python - r = c.encode(['apple.png', # local image - 'https://clip-as-service.jina.ai/_static/favicon.png', # remote image - '']) # in image URI - - print(r.shape) # [3, 512] - ``` + + ```python + r = c.encode(['apple.png', # local image + 'https://clip-as-service.jina.ai/_static/favicon.png', # remote image + '']) # in image URI + + print(r.shape) # [3, 512] + ``` More comprehensive server and client user guides can be found in the [docs](https://clip-as-service.jina.ai/). @@ -415,7 +416,7 @@ da = DocumentArray.pull('ttl-embedding', show_progress=True, local_cache=True) -#### Search via sentence +#### Search via sentence Let's build a simple prompt to allow a user to type sentence: @@ -461,7 +462,6 @@ Now you can input arbitrary English sentences and view the top-9 matching images - @@ -493,7 +493,7 @@ Now you can input arbitrary English sentences and view the top-9 matching images
"professor cat is very serious"
-Let's save the embedding result for our next example: +Let's save the embedding result for our next example: ```python da.save_binary('ttl-image') @@ -503,7 +503,7 @@ da.save_binary('ttl-image') We can also switch the input and output of the last program to achieve image-to-text search. Precisely, given a query image find the sentence that best describes the image. -Let's use all sentences from the book "Pride and Prejudice". +Let's use all sentences from the book "Pride and Prejudice". ```python from docarray import Document, DocumentArray @@ -521,23 +521,23 @@ da.summary() ``` ```text - Documents Summary - - Length 6403 - Homogenous Documents True - Common Attributes ('id', 'text') - - Attributes Summary - - Attribute Data type #Unique values Has empty value - ────────────────────────────────────────────────────────── - id ('str',) 6403 False - text ('str',) 6030 False + Documents Summary + + Length 6403 + Homogenous Documents True + Common Attributes ('id', 'text') + + Attributes Summary + + Attribute Data type #Unique values Has empty value + ────────────────────────────────────────────────────────── + id ('str',) 6403 False + text ('str',) 6030 False ``` #### Encode sentences -Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network: +Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network: ```python from clip_client import Client @@ -575,7 +575,7 @@ for d in img_da.sample(10): #### Showcase -Fun time! Note, unlike the previous example, here the input is an image and the sentence is the output. All sentences come from the book "Pride and Prejudice". +Fun time! Note, unlike the previous example, here the input is an image and the sentence is the output. All sentences come from the book "Pride and Prejudice". @@ -584,7 +584,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the Visualization of the image sprite of Totally looks like dataset

-
@@ -632,7 +631,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the Visualization of the image sprite of Totally looks like dataset

-
@@ -673,8 +671,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the
- - ### Rank image-text matches via CLIP model From `0.3.0` CLIP-as-service adds a new `/rank` endpoint that re-ranks cross-modal matches according to their joint likelihood in CLIP model. For example, given an image Document with some predefined sentence matches as below: @@ -706,7 +702,7 @@ print(r['@m', ['text', 'scores__clip_score__value']]) ``` ```text -[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], +[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], [0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]] ``` @@ -748,7 +744,17 @@ class ReRank(Executor): Intrigued? That's only scratching the surface of what CLIP-as-service is capable of. [Read our docs to learn more](https://clip-as-service.jina.ai). +## Build locally with Docker + +You need to be in the `server` directory to build the Docker image. + +```bash +cd server +docker build . -f ../Dockerfiles/cuda.Dockerfile -t clip-as-service-gpu:latest +``` + + ## Support - Join our [Slack community](https://slack.jina.ai) and chat with other community members about ideas.