Skip to content

Commit

Permalink
2023-06-08-instructor_base_en (#13850)
Browse files Browse the repository at this point in the history
* Add model 2023-06-08-instructor_base_en

* Update 2023-06-08-instructor_base_en.md

* Add model 2023-06-21-e5_base_v2_en

* Add model 2023-06-21-e5_base_en

* Add model 2023-06-21-e5_large_v2_en

* Add model 2023-06-21-e5_large_en

* Add model 2023-06-21-e5_small_v2_en

* Add model 2023-06-21-e5_small_en

* Add model 2023-06-21-instructor_large_en

---------

Co-authored-by: prabod <[email protected]>
Co-authored-by: Maziyar Panahi <[email protected]>
  • Loading branch information
3 people authored Jul 1, 2023
1 parent ced98b6 commit dfaabd4
Show file tree
Hide file tree
Showing 8 changed files with 572 additions and 0 deletions.
75 changes: 75 additions & 0 deletions docs/_posts/prabod/2023-06-08-instructor_base_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
layout: model
title: Instructor Base Sentence Embeddings
author: John Snow Labs
name: instructor_base
date: 2023-06-08
tags: [instructor, sentence_embeddings, t5, text_semantic_similarity, text_reranking, sentence_similarity, en, open_source, tensorflow]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: InstructorEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks.

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/instructor_base_en_5.0.0_3.0_1686224519068.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/instructor_base_en_5.0.0_3.0_1686224519068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
instruction = InstructorEmbeddings.pretrained("instructor_base","en") \
.setInstruction("Instruction here: ") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")

pipeline = Pipeline().setStages([document_assembler, instruction])
```
```scala
val embeddings = InstructorEmbeddings
.pretrained("instructor_base","en")
.setInstruction("Instruction here: ")
.setInputCols(Array("document"))
.setOutputCol("instructor")

val pipeline = new Pipeline().setStages(Array(document, embeddings))
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|instructor_base|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[instructor]|
|Language:|en|
|Size:|406.6 MB|

## References

https://huggingface.co/hkunlp/instructor-base
71 changes: 71 additions & 0 deletions docs/_posts/prabod/2023-06-21-e5_base_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: model
title: E5 Base Sentence Embeddings
author: John Snow Labs
name: e5_base
date: 2023-06-21
tags: [en, open_source, tensorflow]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: E5Embeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_base_en_5.0.0_3.0_1687350215936.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_base_en_5.0.0_3.0_1687350215936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
embeddings =E5Embeddings.pretrained("e5_base","en") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")

pipeline = Pipeline().setStages([document_assembler, embeddings])
```
```scala
val embeddings = E5Embeddings.pretrained("e5_base","en")
.setInputCols(["document"])
.setOutputCol("e5_embeddings")
val pipeline = new Pipeline().setStages(Array(document, embeddings))
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|e5_base|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[e5]|
|Language:|en|
|Size:|260.5 MB|

## References

https://huggingface.co/intfloat/e5-base
68 changes: 68 additions & 0 deletions docs/_posts/prabod/2023-06-21-e5_base_v2_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
layout: model
title: E5 Base v2 Sentence Embeddings
author: John Snow Labs
name: e5_base_v2
date: 2023-06-21
tags: [e5, sentence_embeddings, en, open_source, tensorflow]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.4
supported: true
engine: tensorflow
annotator: E5Embeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_base_v2_en_5.0.0_3.4_1687349803929.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_base_v2_en_5.0.0_3.4_1687349803929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
embeddings =E5Embeddings.pretrained("e5_base_v2","en") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")

pipeline = Pipeline().setStages([document_assembler, embeddings])
```
```scala
val embeddings = E5Embeddings.pretrained("e5_base_v2","en")
.setInputCols(["document"])
.setOutputCol("e5_embeddings")

val pipeline = new Pipeline().setStages(Array(document, embeddings))
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|e5_base_v2|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[e5]|
|Language:|en|
|Size:|260.6 MB|
71 changes: 71 additions & 0 deletions docs/_posts/prabod/2023-06-21-e5_large_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: model
title: E5 Large Sentence Embeddings
author: John Snow Labs
name: e5_large
date: 2023-06-21
tags: [en, open_source, tensorflow]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: E5Embeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_en_5.0.0_3.0_1687350762773.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_en_5.0.0_3.0_1687350762773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
embeddings =E5Embeddings.pretrained("e5_large","en") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")

pipeline = Pipeline().setStages([document_assembler, embeddings])
```
```scala
val embeddings = E5Embeddings.pretrained("e5_large","en")
.setInputCols(["document"])
.setOutputCol("e5_embeddings")
val pipeline = new Pipeline().setStages(Array(document, embeddings))
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|e5_large|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[e5]|
|Language:|en|
|Size:|799.1 MB|

## References

https://huggingface.co/intfloat/e5-large
71 changes: 71 additions & 0 deletions docs/_posts/prabod/2023-06-21-e5_large_v2_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: model
title: E5 Large V2 Sentence Embeddings
author: John Snow Labs
name: e5_large_v2
date: 2023-06-21
tags: [en, open_source, tensorflow]
task: Embeddings
language: en
edition: Spark NLP 5.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: E5Embeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

## Predicted Entities



{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_v2_en_5.0.0_3.0_1687350498606.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_v2_en_5.0.0_3.0_1687350498606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
embeddings =E5Embeddings.pretrained("e5_large_v2","en") \
.setInputCols(["documents"]) \
.setOutputCol("instructor")

pipeline = Pipeline().setStages([document_assembler, embeddings])
```
```scala
val embeddings = E5Embeddings.pretrained("e5_large_v2","en")
.setInputCols(["document"])
.setOutputCol("e5_embeddings")
val pipeline = new Pipeline().setStages(Array(document, embeddings))
```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|e5_large_v2|
|Compatibility:|Spark NLP 5.0.0+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[e5]|
|Language:|en|
|Size:|799.1 MB|

## References

https://huggingface.co/intfloat/e5-large-v2
Loading

0 comments on commit dfaabd4

Please sign in to comment.