Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Upstage AI Integration (Document Parse, Embeddings, and Generation) #314

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hunkim
Copy link

@hunkim hunkim commented Nov 4, 2024

This PR introduces comprehensive integration with Upstage AI services, adding three new components.

They have high performance. (For more information, visit https://www.upstage.ai/ and https://www.upstage.ai/products/document-parse)

image
image

  1. Document Processing:
  • Added UpstageDocumentParseReader for converting documents to structured HTML format
  • Integrated with Upstage Document AI API for enhanced document parsing
  1. Embedding Support:
  • Added UpstageEmbedder supporting 'embedding-query' and 'embedding-passage' models
  • Implemented vector generation for both documents and queries
  1. Text Generation:
  • Added UpstageGenerator supporting Solar models (solar-pro, solar-mini)
  • Implemented streaming response generation with context handling

Configuration Updates:

  • Added new environment variables:
    • UPSTAGE_API_KEY: For authentication
    • UPSTAGE_BASE_URL: For custom endpoint configuration (optional)
  • Updated README.md to reflect new capabilities and configuration options

The integration is available in both production and development environments, maintaining consistent API patterns with existing integrators.

Testing:

  • Verified document parsing functionality with various file formats
  • Tested embedding generation and similarity search
  • Validated streaming text generation with context windows
  • Confirmed proper error handling and rate limiting

Dependencies:

  • No new package dependencies required
  • Uses existing aiohttp/httpx for API communication

Documentation:

  • Updated README.md with new features and configuration options
  • Added inline documentation for all new components

This integration enhances Verba's capabilities with state-of-the-art document processing, embedding, and generation features from Upstage AI.

…neration)

Description:
This PR introduces comprehensive integration with Upstage AI services, adding three new components:

1. Document Processing:
- Added UpstageDocumentParseReader for converting documents to structured HTML format
- Integrated with Upstage Document AI API for enhanced document parsing

2. Embedding Support:
- Added UpstageEmbedder supporting 'embedding-query' and 'embedding-passage' models
- Implemented vector generation for both documents and queries

3. Text Generation:
- Added UpstageGenerator supporting Solar models (solar-pro, solar-mini)
- Implemented streaming response generation with context handling

Configuration Updates:
- Added new environment variables:
  * UPSTAGE_API_KEY: For authentication
  * UPSTAGE_BASE_URL: For custom endpoint configuration (optional)
- Updated README.md to reflect new capabilities and configuration options

The integration is available in both production and development environments, maintaining consistent API patterns with existing integrators.

Testing:
- Verified document parsing functionality with various file formats
- Tested embedding generation and similarity search
- Validated streaming text generation with context windows
- Confirmed proper error handling and rate limiting

Dependencies:
- No new package dependencies required
- Uses existing aiohttp/httpx for API communication

Documentation:
- Updated README.md with new features and configuration options
- Added inline documentation for all new components

This integration enhances Verba's capabilities with state-of-the-art document processing, embedding, and generation features from Upstage AI.
@weaviate-git-bot
Copy link

To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.

beep boop - the Weaviate bot 👋🤖

PS:
Are you already a member of the Weaviate Slack channel?

@hunkim
Copy link
Author

hunkim commented Nov 5, 2024

If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.

I agree. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants