Skip to content

Commit

Permalink
FEAT: Support download models from modelscope (#475)
Browse files Browse the repository at this point in the history
  • Loading branch information
aresnow1 authored Sep 22, 2023
1 parent 82ed561 commit 1b4e14f
Show file tree
Hide file tree
Showing 12 changed files with 722 additions and 180 deletions.
1 change: 1 addition & 0 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ jobs:
pip install ctransformers
pip install sentence-transformers
pip install s3fs
pip install modelscope
pip install -e ".[dev]"
working-directory: .

Expand Down
75 changes: 40 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,47 +194,52 @@ To view the builtin models, run the following command:
$ xinference registrations
```

| Type | Name | Language | Ability |
|------|---------------------|--------------|-----------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| Type | Name | Language | Ability |
|------|---------------------|--------------|------------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| LLM | glaive-coder | ['en'] | ['chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-16k | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-8k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-16k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm-7b | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat-7b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-20b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

For in-depth details on the built-in models, please refer to [built-in models](https://inference.readthedocs.io/en/latest/models/builtin/index.html).

**NOTE**:
- Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
- If you have trouble downloading models from the Hugging Face, run `export XINFERENCE_MODEL_SRC=xorbits` to download models from our mirror site.
- If you have trouble downloading models from the Hugging Face, run `export XINFERENCE_MODEL_SRC=modelscope` to download models from [modelscope](https://modelscope.cn/). Models supported by modelscope:
- llama-2
- llama-2-chat
- baichuan-2
- baichuan-2-chat
- chatglm2
- chatglm2-32k
- internlm-chat-20b

## Custom models
Please refer to [custom models](https://inference.readthedocs.io/en/latest/models/custom.html).
2 changes: 1 addition & 1 deletion README_ja_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ $ xinference registrations
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

****:
- Xinference は自動的にモデルをダウンロードし、デフォルトでは `${USER}/.xinference/cache` の下に保存されます。
Expand Down
77 changes: 41 additions & 36 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,47 +176,52 @@ model.chat(
$ xinference registrations
```

| Type | Name | Language | Ability |
|------|---------------------|--------------|-----------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| Type | Name | Language | Ability |
|------|---------------------|--------------|------------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| LLM | glaive-coder | ['en'] | ['chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-16k | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-8k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-16k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm-7b | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat-7b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-20b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

更多信息请参考 [内置模型](https://inference.readthedocs.io/en/latest/models/builtin/index.html)

**注意**:
- Xinference 会自动为你下载模型,默认的模型存放路径为 `${USER}/.xinference/cache`
- 如果您在Hugging Face下载模型时遇到问题,请运行 `export XINFERENCE_MODEL_SRC=xorbits`,从我们的镜像站点下载模型。
-
- 如果您在Hugging Face下载模型时遇到问题,请运行 `export XINFERENCE_MODEL_SRC=modelscope`,默认优先从 modelscope 下载。目前 modelscope 支持的模型有:
- llama-2
- llama-2-chat
- baichuan-2
- baichuan-2-chat
- chatglm2
- chatglm2-32k
- internlm-chat-20b

## 自定义模型
请参考 [自定义模型](https://inference.readthedocs.io/en/latest/models/custom.html)
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ install_requires =
typing_extensions
fsspec
s3fs
modelscope

[options.packages.find]
exclude =
Expand Down
7 changes: 7 additions & 0 deletions xinference/model/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from .core import LLM
from .llm_family import (
BUILTIN_LLM_FAMILIES,
BUILTIN_MODELSCOPE_LLM_FAMILIES,
LLM_CLASSES,
GgmlLLMSpecV1,
LLMFamilyV1,
Expand Down Expand Up @@ -83,6 +84,12 @@ def _install():
for json_obj in json.load(codecs.open(json_path, "r", encoding="utf-8")):
BUILTIN_LLM_FAMILIES.append(LLMFamilyV1.parse_obj(json_obj))

modelscope_json_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "llm_family_modelscope.json"
)
for json_obj in json.load(codecs.open(modelscope_json_path, "r", encoding="utf-8")):
BUILTIN_MODELSCOPE_LLM_FAMILIES.append(LLMFamilyV1.parse_obj(json_obj))

from ...constants import XINFERENCE_MODEL_DIR

user_defined_llm_dir = os.path.join(XINFERENCE_MODEL_DIR, "llm")
Expand Down
52 changes: 4 additions & 48 deletions xinference/model/llm/llm_family.json
Original file line number Diff line number Diff line change
Expand Up @@ -1015,7 +1015,7 @@
{
"version": 1,
"context_length": 8192,
"model_name": "internlm",
"model_name": "internlm-7b",
"model_lang": [
"en",
"zh"
Expand All @@ -1042,7 +1042,7 @@
{
"version": 1,
"context_length": 4096,
"model_name": "internlm-chat",
"model_name": "internlm-chat-7b",
"model_lang": [
"en",
"zh"
Expand Down Expand Up @@ -1083,54 +1083,10 @@
]
}
},
{
"version": 1,
"context_length": 8192,
"model_name": "internlm-chat-8k",
"model_lang": [
"en",
"zh"
],
"model_ability": [
"embed",
"chat"
],
"model_description": "Internlm-chat-8k is a special version of Internlm-chat, with a context window of 8k tokens instead of 4k.",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 7,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_id": "internlm/internlm-chat-7b-8k",
"model_revision": "8bd146e7dc41ba5f3eba95679554a03acc9f0043"
}
],
"prompt_style": {
"style_name": "INTERNLM",
"system_prompt": "",
"roles": [
"<|User|>",
"<|Bot|>"
],
"intra_message_sep": "<eoh>\n",
"inter_message_sep": "<eoa>\n",
"stop_token_ids": [
1,
103028
],
"stop": [
"<eoa>"
]
}
},
{
"version": 1,
"context_length": 16384,
"model_name": "internlm-16k",
"model_name": "internlm-20b",
"model_lang": [
"en",
"zh"
Expand All @@ -1157,7 +1113,7 @@
{
"version": 1,
"context_length": 16384,
"model_name": "internlm-chat-16k",
"model_name": "internlm-chat-20b",
"model_lang": [
"en",
"zh"
Expand Down
Loading

0 comments on commit 1b4e14f

Please sign in to comment.