Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md & Add more data into csv& change UI of Pipelines #3237

Merged
merged 1 commit into from
Sep 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions pipelines/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,28 @@ document_store.update_embeddings(retriever, batch_size=256)
#### 运行后台程序出现了错误:`Exception: Failed loading pipeline component 'DocumentStore': RequestError(400, 'illegal_argument_exception', 'Mapper for [embedding] conflicts with existing mapper:\n\tCannot update parameter [dims] from [312] to [768]')`

以语义检索为例,这是因为模型的维度不对造成的,请检查一下 `elastic search`中的文本的向量的维度和`semantic_search.yaml`里面`DocumentStore`设置的维度`embedding_dim`是否一致,如果不一致,请重新使用`utils/offline_ann.py`构建索引。总之,请确保构建索引所用到的模型和`semantic_search.yaml`设置的模型是一致的。

#### 安装后出现错误:`cannot import name '_registerMatType' from 'cv2'`

opencv版本不匹配的原因,可以对其进行升级到最新版本,保证opencv系列的版本一致。

```
pip install opencv-contrib-python --upgrade
pip install opencv-contrib-python-headless --upgrade
pip install opencv-python --upgrade
```

#### 安装运行出现 `RuntimeError: Can't load weights for 'rocketqa-zh-nano-query-encoder'`

rocketqa模型2.3.7之后才添加,paddlenlp版本需要升级:
```
pip install paddlenlp --upgrade
```

#### 安装出现问题 `The repository located at mirrors.aliyun.com is not a trusted or secure host and is being ignored.`

设置pip源为清华源,然后重新安装,可运行如下命令进行设置:

```
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```
25 changes: 22 additions & 3 deletions pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
PaddleNLP Pipelines 是一个端到端智能文本产线框架,面向 NLP **全场景**,帮助用户**低门槛**构建强大**产品级系统**。

<div align="center">
<img src="https://user-images.githubusercontent.com/11793384/168514868-1babe981-c675-4f89-9168-dd0a3eede315.gif" width="500">
<img src="https://user-images.githubusercontent.com/12107462/189293482-1ba0d500-9726-4a67-bdc5-f71339cfe773.gif" width="500">
</div>


更多效果展示Demo请参考 [效果展示](#效果展示)

## 智能文本产线特色
* **全场景支持**:依托灵活的插拔式组件产线化设计,支持各类 NLP 场景任务,包括:信息抽取、情感倾向分析、阅读理解、检索系统、问答系统、文本分类、文本生成等。
Expand All @@ -31,9 +31,28 @@ PaddleNLP Pipelines 智能文本产线库针对 NLP 部分高频场景开源了

* 快速搭建产品级[**语义检索**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/semantic-search)系统:使用自然语言文本通过语义进行智能文档查询,而不是关键字匹配
* 快速搭建产品级[**智能问答**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/question-answering)系统:用自然语言提问,即可获得精准答案片段
* 快速搭建产品级 [**FAQ 问答**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/frequently-asked-question)系统用自然语言提问,匹配相关的高频问题,并返回匹配到的高频问题的答案
* 快速搭建产品级 [**FAQ 问答**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/frequently-asked-question)系统用自然语言提问,匹配相关的高频问题,并返回匹配到的高频问题的答案
* 快速搭建产品级**多模态信息抽取**系统(即将开放,敬请期待)

### 效果展示

+ 语义检索

<div align="center">
<img src="https://user-images.githubusercontent.com/12107462/189293482-1ba0d500-9726-4a67-bdc5-f71339cfe773.gif" width="400">
</div>

+ 智能问答

<div align="center">
<img src="https://user-images.githubusercontent.com/12107462/189299496-70e5a4d9-862f-45ce-a036-64b91e587035.gif" width="400">
</div>

+ FAQ智能问答

<div align="center">
<img src="https://user-images.githubusercontent.com/12107462/189297769-dd2658d3-5a0e-4d79-96a4-12903b6d3acd.gif" width="400">
</div>

| | |
|-|-|
Expand Down
3 changes: 2 additions & 1 deletion pipelines/examples/semantic-search/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,8 @@ curl http://localhost:9200/_aliases?pretty=true
```
# 以DuReader-Robust 数据集为例建立 ANN 索引库
python utils/offline_ann.py --index_name dureader_robust_query_encoder \
--doc_dir data/dureader_dev
--doc_dir data/dureader_dev \
--delete_index
```
可以使用下面的命令来查看数据:

Expand Down
3 changes: 3 additions & 0 deletions pipelines/ui/baike_qa.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
"Question Text";"Answer"
"中国的首都在哪里?";"北京"
"湖北的省会在哪里?";"武汉"
"湘西土家族苗族自治州在哪儿?";"湖南省辖自治州(地级行政区),地处湖南省西北部"
"湖北省人口有多少人?";"5830万人"
"厦门市的生产总值是多少?";"7033.89亿元"
4 changes: 4 additions & 0 deletions pipelines/ui/dureader_search.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"Question Text";"Answer"
"期货交易手续费指的是什么?";"期货交易者买卖期货成交后按成交合约总价值的一定比例所支付的费用。"
"衡量酒水的价格的因素有哪些?";"酒水的血统,存储的时间等"
"母亲节是那一天?";"每年5月的第二个星期日,是母亲节"
"1P空调一般是制冷量是多少?";"2300W--2600W"
"个人认证的微博帐号的申请条件";"绑定手机、有头像、粉丝数不低于30、关注数不低于30。"
"国内现货原油交易的手续费";"万分之十二到万分之十六之间"
6 changes: 3 additions & 3 deletions pipelines/ui/webapp_faq.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def on_change_text():
def main():

st.set_page_config(
page_title="pipelines FAQ智能问答",
page_title="PaddleNLP Pipelines FAQ智能问答",
page_icon=
"https://github.com/PaddlePaddle/Paddle/blob/develop/doc/imgs/logo.png")

Expand All @@ -76,7 +76,7 @@ def reset_results(*args):
st.session_state.raw_json = None

# Title
st.write("# PaddleNLP 保险FAQ问答")
st.write("# PaddleNLP Pipelines FAQ智能问答")
# Sidebar
st.sidebar.header("选项")
top_k_reader = st.sidebar.slider(
Expand Down Expand Up @@ -199,7 +199,7 @@ def reset_results(*args):
markdown(context),
unsafe_allow_html=True,
)
st.write("**FAQ答案:** ", result["answer"])
st.write("**答案:** ", result["answer"])
st.write("**Relevance:** ", result["relevance"])

st.write("___")
Expand Down
4 changes: 2 additions & 2 deletions pipelines/ui/webapp_question_answering.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def on_change_text():
def main():

st.set_page_config(
page_title="PaddleNLP 智能问答",
page_title="PaddleNLP Pipelines 智能问答",
page_icon=
"https://github.com/PaddlePaddle/Paddle/blob/develop/doc/imgs/logo.png")

Expand All @@ -73,7 +73,7 @@ def reset_results(*args):
st.session_state.raw_json = None

# Title
st.write("# PaddleNLP 智能问答")
st.write("# PaddleNLP Pipelines 智能问答")
# Sidebar
st.sidebar.header("选项")
top_k_retriever = st.sidebar.slider(
Expand Down
4 changes: 2 additions & 2 deletions pipelines/ui/webapp_semantic_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def on_change_text():
def main():

st.set_page_config(
page_title="pipelines 语义检索",
page_title="PaddleNLP Pipelines 语义检索",
page_icon=
"https://github.com/PaddlePaddle/Paddle/blob/develop/doc/imgs/logo.png")

Expand All @@ -75,7 +75,7 @@ def reset_results(*args):
st.session_state.raw_json = None

# Title
st.write("# PaddleNLP语义检索")
st.write("# PaddleNLP Pipelines 语义检索")
# Sidebar
st.sidebar.header("选项")
top_k_reader = st.sidebar.slider(
Expand Down