Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pinecone destination connector #1774

Merged
merged 118 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from 115 commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
0389b52
add index creation script
ahmetmeleq Oct 17, 2023
f5fe2a2
rebase off main for the changes in ingest cli
ahmetmeleq Oct 18, 2023
655ffb6
trials on bugfix
ahmetmeleq Oct 19, 2023
33f5054
fix dependency name
ahmetmeleq Oct 19, 2023
473f73c
apply roman's updates to pinecone
ahmetmeleq Oct 19, 2023
0ee9e6b
trials on pinecone example
ahmetmeleq Oct 19, 2023
c6b1dc5
serially batched upsert with embeddings issue workaround
ahmetmeleq Oct 23, 2023
4bb0b1b
parallelized upsert with session handles
ahmetmeleq Oct 24, 2023
2239494
skip chunking to avoid missing embeddings, remove zipping (another wo…
ahmetmeleq Oct 25, 2023
d781698
fix for logging error
ahmetmeleq Oct 25, 2023
6a96193
alphabetic order setup.py
ahmetmeleq Oct 25, 2023
0289dcc
add docs
ahmetmeleq Oct 25, 2023
4bec171
docs
ahmetmeleq Oct 25, 2023
10fb5e7
docs
ahmetmeleq Oct 25, 2023
5a3975c
rearrange imports
ahmetmeleq Oct 25, 2023
397328f
add dependencies
ahmetmeleq Oct 25, 2023
1f6aacb
update example
ahmetmeleq Oct 25, 2023
ab73a49
add tests
ahmetmeleq Oct 25, 2023
d040010
add pinecone ingest test
ahmetmeleq Oct 25, 2023
03b32bc
obfuscate embedding api keys
ahmetmeleq Oct 25, 2023
9895077
update pinecone cli based on the new cli rebase
ahmetmeleq Oct 25, 2023
9b3096e
shellcheck
ahmetmeleq Oct 26, 2023
099fc4f
changelog and version
ahmetmeleq Oct 26, 2023
d1b1045
linting
ahmetmeleq Oct 26, 2023
4fb14b0
linting
ahmetmeleq Oct 26, 2023
5c33688
linting
ahmetmeleq Oct 26, 2023
b1069d4
fix chunking node logs
ahmetmeleq Oct 26, 2023
67ccfaf
remove redundant secret from test fixtures update pr job
ahmetmeleq Oct 26, 2023
9dbad76
remove redundant helper script
ahmetmeleq Oct 26, 2023
2470dc7
remove redundant comments in test
ahmetmeleq Oct 26, 2023
2e4dda2
update example
ahmetmeleq Oct 26, 2023
ae8598e
fix log in pipeline embedding node
ahmetmeleq Oct 26, 2023
15d2459
change pinecone batching size
ahmetmeleq Oct 26, 2023
0c28c17
add debugging tip
ahmetmeleq Oct 26, 2023
5f39a64
update ingest test with chunking
ahmetmeleq Oct 26, 2023
f307a0a
update example with chunking
ahmetmeleq Oct 26, 2023
daeecf9
organize requirements
ahmetmeleq Oct 27, 2023
134a8bf
update expected uploads based on the updates in main
ahmetmeleq Oct 27, 2023
9c12d3b
session handle fix
ahmetmeleq Oct 27, 2023
6d84efc
doc, comment and logging updates
ahmetmeleq Oct 30, 2023
84e65e5
test and session creation updates
ahmetmeleq Oct 30, 2023
8d7612e
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Oct 30, 2023
5a438b8
update for cli changes
ahmetmeleq Oct 30, 2023
cf77315
do not exclude metadata
ahmetmeleq Nov 3, 2023
17c724b
multiple attempts for testing
ahmetmeleq Nov 3, 2023
0257769
fix path typos on setup.py
ahmetmeleq Nov 3, 2023
fec2263
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 3, 2023
830e387
reorder test, update path in test
ahmetmeleq Nov 3, 2023
5ef236f
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 7, 2023
ca94947
setup py changes from main
ahmetmeleq Nov 7, 2023
791bf03
ingest test uses huggingface embedder
ahmetmeleq Nov 7, 2023
f434c3d
remove comment
ahmetmeleq Nov 7, 2023
b7345c7
add secret to test_ingest_dest job
ahmetmeleq Nov 7, 2023
766d485
make batch size a parameter
ahmetmeleq Nov 8, 2023
69e1949
bugfix on chunking params and implementing related test
ahmetmeleq Nov 8, 2023
3fd8c62
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 8, 2023
f2786ce
pass metadata fields individually
ahmetmeleq Nov 8, 2023
d1b1cd2
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 8, 2023
c945851
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 8, 2023
4b49fbd
implement check_connection
ahmetmeleq Nov 9, 2023
1ff1fd6
expose writer num_processes, apply parallelization in ingest test
ahmetmeleq Nov 9, 2023
007ad36
fix session handles
ahmetmeleq Nov 13, 2023
b4b858a
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 13, 2023
3d81cfd
logging updates
ahmetmeleq Nov 13, 2023
fac751e
changelog and version
ahmetmeleq Nov 13, 2023
35be64a
random index names to avoid test run collisions
ahmetmeleq Nov 13, 2023
0440eb2
re-add --chunk-new-after-n-chars
ahmetmeleq Nov 13, 2023
1e8f34e
add support for new_after_n_chars
ahmetmeleq Nov 13, 2023
e700a75
check existence of num_processes (dest) when logging
ahmetmeleq Nov 13, 2023
80fed3b
update docs
ahmetmeleq Nov 13, 2023
00b123e
update example and docs
ahmetmeleq Nov 14, 2023
1ead5e0
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 14, 2023
bac73f0
changelog
ahmetmeleq Nov 14, 2023
1e6ff4c
fix typo in example
ahmetmeleq Nov 14, 2023
c44ab12
index creation retry logic for when another index is being deleted in…
ahmetmeleq Nov 14, 2023
94e66b3
index creation retry logic for when another index is being deleted in…
ahmetmeleq Nov 14, 2023
be87dd4
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 14, 2023
567ed4e
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 14, 2023
074c1ca
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 14, 2023
9adefb7
update project variables, update sleep amounts
ahmetmeleq Nov 16, 2023
65fce1c
update docs
ahmetmeleq Nov 16, 2023
3a16a08
update docs
ahmetmeleq Nov 16, 2023
362eb81
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 16, 2023
c3266f0
update docs
ahmetmeleq Nov 16, 2023
8a6a0cb
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 16, 2023
387a7ad
remove download_dir, remove index creation loop
ahmetmeleq Nov 16, 2023
f884123
update example
ahmetmeleq Nov 16, 2023
cbd734f
pythonic approach in docs
ahmetmeleq Nov 16, 2023
fa083ff
update log
ahmetmeleq Nov 17, 2023
29758f8
move upsert method
ahmetmeleq Nov 17, 2023
14d4e51
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 17, 2023
5723467
shellcheck
ahmetmeleq Nov 17, 2023
992b60d
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 17, 2023
4939113
Update docs/source/ingest/destination_connectors/pinecone.rst
ahmetmeleq Nov 20, 2023
9812d93
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 20, 2023
b6e9773
version
ahmetmeleq Nov 20, 2023
4f65b49
s3 docs pythonic approach and local connector
ahmetmeleq Nov 21, 2023
738d75c
add comment on why we use random rather than uuidgen
ahmetmeleq Nov 21, 2023
fe818e4
check if test variables are defined before setting
ahmetmeleq Nov 22, 2023
937bdfa
shellcheck double quotes
ahmetmeleq Nov 22, 2023
a2b2fc3
update parent classes for cliconfig
ahmetmeleq Nov 23, 2023
940f72d
different number of processes for processor and writer in test
ahmetmeleq Nov 23, 2023
e51b88f
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 23, 2023
35701dc
add comment, add field selection from element, add list items separat…
ahmetmeleq Nov 24, 2023
1278379
walrus syntax := instead of if [-z $...] for default parameters
ahmetmeleq Nov 28, 2023
0ec7cae
better type checking for session handles
ahmetmeleq Nov 28, 2023
7b9e02b
implement check_connection
ahmetmeleq Nov 28, 2023
1cf1290
move log for number of (upload) processes from pipeline to connector
ahmetmeleq Nov 28, 2023
ca0785e
update embedding docs to have embedding prepend for cli args
ahmetmeleq Nov 28, 2023
83518b0
add potter's flatten lists to flatten dicts
ahmetmeleq Nov 29, 2023
e1a6365
make all element fields indexable, add element_serialized
ahmetmeleq Nov 29, 2023
f8688e5
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 29, 2023
e353c6b
unique ids for pinecone entries rather than using element ids
ahmetmeleq Nov 29, 2023
7e0c7e7
Merge branch 'ahmet/pinecone-connector' of https://github.com/Unstruc…
ahmetmeleq Nov 29, 2023
b54e5ce
an additional error wrapper for check connection
ahmetmeleq Nov 29, 2023
07dfbd8
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 29, 2023
cebfcb6
changelog and version
ahmetmeleq Nov 29, 2023
5cfef3c
Merge branch 'main' into ahmet/pinecone-connector
ahmetmeleq Nov 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ jobs:
AZURE_SEARCH_ENDPOINT: ${{ secrets.AZURE_SEARCH_ENDPOINT }}
AZURE_SEARCH_API_KEY: ${{ secrets.AZURE_SEARCH_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PINECONE_API_KEY: ${{secrets.PINECONE_API_KEY}}
TABLE_OCR: "tesseract"
OCR_AGENT: "tesseract"
CI: "true"
Expand Down Expand Up @@ -347,6 +348,7 @@ jobs:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
MONGODB_URI: ${{ secrets.MONGODB_URI }}
MONGODB_DATABASE_NAME: ${{ secrets.MONGODB_DATABASE_NAME }}
PINECONE_API_KEY: ${{secrets.PINECONE_API_KEY}}
TABLE_OCR: "tesseract"
OCR_AGENT: "tesseract"
CI: "true"
Expand Down
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.11.1-dev4
## 0.11.1-dev5

### Enhancements

Expand All @@ -7,13 +7,15 @@
### Features

* **Adds HubSpot connector** Adds connector to retrieve call, communications, emails, notes, products and tickets from HubSpot
* * **Add Pinecone destination connector.** Problem: After ingesting data from a source, users might want to produce embeddings for their data and write these into a vector DB. Pinecone is an option among these vector databases. Feature: Added Pinecone destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into Pinecone.

### Fixes

* **Do not extract text of `<style>` tags in HTML.** `<style>` tags containing CSS in invalid positions previously contributed to element text. Do not consider text node of a `<style>` element as textual content.
* **Fix DOCX merged table cell repeats cell text.** Only include text for a merged cell, not for each underlying cell spanned by the merge.
* **Fix tables not extracted from DOCX header/footers.** Headers and footers in DOCX documents skip tables defined in the header and commonly used for layout/alignment purposes. Extract text from tables as a string and include in the `Header` and `Footer` document elements.
* **Fix output filepath for fsspec-based source connectors.** Previously the base directory was being included in the output filepath unnecessarily.
* **Process chunking parameter names in ingest correctly** Solves a bug where chunking parameters weren't being processed and used by ingest cli by renaming faulty parameter names and prepends; adds relevant parameters to ingest pinecone test to verify that the parameters are functional.

## 0.11.0

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,10 @@ install-ingest-jira:
install-ingest-hubspot:
python3 -m pip install -r requirements/ingest-hubspot.txt

.PHONY: install-ingest-pinecone
install-ingest-pinecone:
python3 -m pip install -r requirements/ingest-pinecone.txt

.PHONY: install-embed-huggingface
install-embed-huggingface:
python3 -m pip install -r requirements/ingest/embed-huggingface.txt
Expand Down
4 changes: 2 additions & 2 deletions docs/source/core/chunking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ that span between pages. This kwarg is ``True`` by default.
not split elements, it is possible for a section to exceed that lenght, for
example if a ``NarrativeText`` elements exceeds ``1500`` characters on its on.

Similarly, sections under ``combine_under_n_chars`` will be combined if they
Similarly, sections under ``combine_text_under_n_chars`` will be combined if they
do not exceed the specified threshold, which defaults to ``500``. This will combine
a series of ``Title`` elements that occur one after another, which sometimes
happens in lists that are not detected as ``ListItem`` elements. Set
``combine_under_n_chars=0`` to turn off this behavior.
``combine_text_under_n_chars=0`` to turn off this behavior.

The following shows an example of how to use ``chunk_by_title``. You will
see the document chunked into sections instead of elements.
Expand Down
1 change: 1 addition & 0 deletions docs/source/ingest/configs/chunking_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ Configs
* ``chunk_elements (default False)``: Boolean flag whether to run chunking as part of the ingest process.
* ``multipage_sections (default True)``: If True, sections can span multiple pages.
* ``combine_text_under_n_chars (default 500)``: Combines elements (for example a series of titles) until a section reaches a length of n characters. Defaults to `max_characters` which combines chunks whenever space allows. Specifying 0 for this argument suppresses combining of small chunks. Note this value is "capped" at the `new_after_n_chars` value since a value higher than that would not change this parameter's effect.
* ``new_after_n_chars (default 1500)``: Cuts off new sections once they reach a length of n characters (soft max). Defaults to `max_characters` when not specified, which effectively disables any soft window. Specifying 0 for this argument causes each element to appear in a chunk by itself (although an element with text longer than `max_characters` will be still be split into two or more chunks).
* ``max_characters (default 1500)``: Chunks elements text and text_as_html (if present) into chunks of length n characters (hard max)
5 changes: 3 additions & 2 deletions docs/source/ingest/configs/embedding_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ the dataset.

Configs
---------------------
* ``api_key``: If an api key is required to generate the embeddings via an api (i.e. OpenAI)
* ``model_name``: The model to use for the embedder.
* ``embedding_provider``: An unstructured embedding provider to use while doing embedding. A few examples: langchain-openai, langchain-huggingface, langchain-aws-bedrock.
* ``embedding_api_key``: If an api key is required to generate the embeddings via an api (i.e. OpenAI)
* ``embedding_model_name``: The model to use for the embedder, if necessary.
2 changes: 2 additions & 0 deletions docs/source/ingest/destination_connectors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ in our community `Slack. <https://short.unstructured.io/pzw05l7>`_
destination_connectors/azure_cognitive_search
destination_connectors/delta_table
destination_connectors/mongodb
destination_connectors/pinecone
destination_connectors/s3
79 changes: 79 additions & 0 deletions docs/source/ingest/destination_connectors/pinecone.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
Pinecone
===========

Batch process all your records using ``unstructured-ingest`` to store structured outputs and embeddings locally on your filesystem and upload those to a Pinecone index.

First you'll need to install the Pinecone dependencies as shown here.

.. code:: shell

pip install "unstructured[pinecone]"

Run Locally
-----------
The upstream connector can be any of the ones supported, but for convenience here, showing a sample command using the
upstream local connector. This will create new files on your local.

.. tabs::

.. tab:: Shell

.. code:: shell

unstructured-ingest \
local \
--input-path example-docs/book-war-and-peace-1225p.txt \
--output-dir local-to-pinecone \
--strategy fast \
--chunk-elements \
--embedding-provider <an unstructured embedding provider, ie. langchain-huggingface> \
--num-processes 2 \
--verbose \
--work-dir "<directory for intermediate outputs to be saved>" \
pinecone \
--api-key <your pinecone api key here> \
--index-name <your index name here, ie. ingest-test> \
--environment <your environment name here, ie. gcp-starter> \
--batch-size <number of elements to be uploaded per batch, ie. 80> \
--num-processes <number of processes to be used to upload, ie. 2>

.. tab:: Python

.. code:: python

import os

from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig, ChunkingConfig, EmbeddingConfig
from unstructured.ingest.runner import LocalRunner
if __name__ == "__main__":
runner = LocalRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir="local-output-to-pinecone",
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(),
chunking_config=ChunkingConfig(
chunk_elements=True
),
embedding_config=EmbeddingConfig(
provider="langchain-huggingface",
),
writer_type="pinecone",
writer_kwargs={
"api_key": os.getenv("PINECONE_API_KEY"),
"index_name": os.getenv("PINECONE_INDEX_NAME"),
"environment": os.getenv("PINECONE_ENVIRONMENT_NAME"),
"batch_size": 80,
"num_processes": 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should leverage the same num_processes being set in the ProcessorConfig. Actually not sure if this is causing a duplicate key error in the CLI itself...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it's the same concern, check #1774 (comment)

}
)
runner.run(
input_path="example-docs/fake-memo.pdf",
)


For a full list of the options the CLI accepts check ``unstructured-ingest <upstream connector> pinecone --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
73 changes: 73 additions & 0 deletions docs/source/ingest/destination_connectors/s3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
S3
===========

Batch process all your records using ``unstructured-ingest`` to store structured outputs locally on your filesystem and upload those local files to an S3 bucket.

First you'll need to install the S3 dependencies as shown here.

.. code:: shell

pip install "unstructured[s3]"

Run Locally
-----------
The upstream connector can be any of the ones supported, but for convenience here, showing a sample command using the
upstream local connector. This will create new files on your local.

.. tabs::

.. tab:: Shell

.. code:: shell

unstructured-ingest \
local \
--input-path example-docs/book-war-and-peace-1225p.txt \
--output-dir local-to-s3 \
--strategy fast \
--chunk-elements \
--embedding-provider <an unstructured embedding provider, ie. langchain-huggingface> \
--num-processes 2 \
--verbose \
--work-dir "<directory for intermediate outputs to be saved>" \
s3 \
--anonymous \
--remote-url "<your destination path here, ie 's3://unstructured/war-and-peace-output'>"

.. tab:: Python

.. code:: python

import os

from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig, ChunkingConfig, EmbeddingConfig
from unstructured.ingest.runner import LocalRunner
if __name__ == "__main__":
runner = LocalRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir="local-output-to-s3",
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(),
chunking_config=ChunkingConfig(
chunk_elements=True
),
embedding_config=EmbeddingConfig(
provider="langchain-huggingface",
),
writer_type="s3",
writer_kwargs={
"anonymous": True,
"--remote-url": "<your destination path here, ie 's3://unstructured/war-and-peace-output'>",
}
)
runner.run(
input_path="example-docs/book-war-and-peace-1225p.txt",
)


For a full list of the options the CLI accepts check ``unstructured-ingest <upstream connector> s3 --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
30 changes: 30 additions & 0 deletions examples/ingest/pinecone/ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash

# Processes all the files from s3://utic-dev-tech-fixtures/small-pdf-set/,
# embeds the processed documents, and writes to results to a Pinecone index.

# Structured outputs are stored in s3-small-batch-output-to-pinecone/

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd "$SCRIPT_DIR"/../../.. || exit 1


# As an example we're using the s3 source connector,
# however ingesting from any supported source connector is possible.
# shellcheck disable=2094
PYTHONPATH=. ./unstructured/ingest/main.py \
local \
--input-path example-docs/book-war-and-peace-1225p.txt \
--output-dir local-to-pinecone \
--strategy fast \
--chunk-elements \
--embedding-provider <an unstructured embedding provider, ie. langchain-huggingface> \
--num-processes 2 \
--verbose \
--work-dir "<directory for intermediate outputs to be saved>" \
pinecone \
--api-key "<Pinecone API Key to write into a Pinecone index>" \
--index-name "<Pinecone index name, ie: ingest-test>" \
--environment "<Pinecone index name, ie: ingest-test>" \
--batch-size "<Number of elements to be uploaded per batch, ie. 80>" \
--num-processes "<Number of processes to be used to upload, ie. 2>"
3 changes: 3 additions & 0 deletions requirements/ingest/pinecone.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-c constraints.in
-c base.txt
pinecone-client
56 changes: 56 additions & 0 deletions requirements/ingest/pinecone.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile requirements/ingest-pinecone.in
#
certifi==2023.7.22
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# requests
charset-normalizer==3.3.0
# via
# -c requirements/base.txt
# requests
dnspython==2.4.2
# via pinecone-client
idna==3.4
# via
# -c requirements/base.txt
# requests
loguru==0.7.2
# via pinecone-client
numpy==1.24.4
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# pinecone-client
pinecone-client==2.2.4
# via -r requirements/ingest-pinecone.in
python-dateutil==2.8.2
# via pinecone-client
pyyaml==6.0.1
# via pinecone-client
requests==2.31.0
# via
# -c requirements/base.txt
# pinecone-client
six==1.16.0
# via
# -c requirements/base.txt
# python-dateutil
tqdm==4.66.1
# via
# -c requirements/base.txt
# pinecone-client
typing-extensions==4.8.0
# via
# -c requirements/base.txt
# pinecone-client
urllib3==1.26.18
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# pinecone-client
# requests
29 changes: 15 additions & 14 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,33 +127,34 @@ def load_requirements(file_list: Optional[Union[str, List[str]]] = None) -> List
"tsv": tsv_reqs,
"xlsx": xlsx_reqs,
# Extra requirements for data connectors
"s3": load_requirements("requirements/ingest/s3.in"),
ahmetmeleq marked this conversation as resolved.
Show resolved Hide resolved
"airtable": load_requirements("requirements/ingest/airtable.in"),
"azure": load_requirements("requirements/ingest/azure.in"),
"azure-cognitive-search": load_requirements(
"requirements/ingest/azure-cognitive-search.in",
),
"biomed": load_requirements("requirements/ingest/biomed.in"),
"box": load_requirements("requirements/ingest/box.in"),
"confluence": load_requirements("requirements/ingest/confluence.in"),
"delta-table": load_requirements("requirements/ingest/delta-table.in"),
"discord": load_requirements("requirements/ingest/discord.in"),
"dropbox": load_requirements("requirements/ingest/dropbox.in"),
"elasticsearch": load_requirements("requirements/ingest/elasticsearch.in"),
"gcs": load_requirements("requirements/ingest/gcs.in"),
"github": load_requirements("requirements/ingest/github.in"),
"gitlab": load_requirements("requirements/ingest/gitlab.in"),
"reddit": load_requirements("requirements/ingest/reddit.in"),
"notion": load_requirements("requirements/ingest/notion.in"),
"slack": load_requirements("requirements/ingest/slack.in"),
"wikipedia": load_requirements("requirements/ingest/wikipedia.in"),
"google-drive": load_requirements("requirements/ingest/google-drive.in"),
"gcs": load_requirements("requirements/ingest/gcs.in"),
"elasticsearch": load_requirements("requirements/ingest/elasticsearch.in"),
"dropbox": load_requirements("requirements/ingest/dropbox.in"),
"box": load_requirements("requirements/ingest/box.in"),
"hubspot": load_requirements("requirements/ingest/hubspot.in"),
"jira": load_requirements("requirements/ingest/jira.in"),
"notion": load_requirements("requirements/ingest/notion.in"),
"onedrive": load_requirements("requirements/ingest/onedrive.in"),
"outlook": load_requirements("requirements/ingest/outlook.in"),
"confluence": load_requirements("requirements/ingest/confluence.in"),
"airtable": load_requirements("requirements/ingest/airtable.in"),
"pinecone": load_requirements("requirements/ingest/pinecone.in"),
"reddit": load_requirements("requirements/ingest/reddit.in"),
"s3": load_requirements("requirements/ingest/s3.in"),
"sharepoint": load_requirements("requirements/ingest/sharepoint.in"),
"delta-table": load_requirements("requirements/ingest/delta-table.in"),
"salesforce": load_requirements("requirements/ingest/salesforce.in"),
"jira": load_requirements("requirements/ingest/jira.in"),
"hubspot": load_requirements("requirements/ingest/hubspot.in"),
"slack": load_requirements("requirements/ingest/slack.in"),
"wikipedia": load_requirements("requirements/ingest/wikipedia.in"),
# Legacy extra requirements
"huggingface": load_requirements("requirements/huggingface.in"),
"local-inference": all_doc_reqs,
Expand Down
Loading
Loading