-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(bedrock): implement new data source structure (#668)
* feat(bedrock): add data source implementation and new chuncking strategies --------- Co-authored-by: Alain Krok <[email protected]>
- Loading branch information
1 parent
a686a3e
commit 04e1efb
Showing
69 changed files
with
9,920 additions
and
639 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
[**@cdklabs/generative-ai-cdk-constructs**](../../../README.md) • **Docs** | ||
|
||
*** | ||
|
||
[@cdklabs/generative-ai-cdk-constructs](../../../README.md) / [bedrock](../README.md) / ChunkingStrategy | ||
|
||
# Class: `abstract` ChunkingStrategy | ||
|
||
## Properties | ||
|
||
### configuration | ||
|
||
> `abstract` **configuration**: `ChunkingConfigurationProperty` | ||
The CloudFormation property representation of this configuration | ||
|
||
*** | ||
|
||
### DEFAULT | ||
|
||
> `readonly` `static` **DEFAULT**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Fixed Sized Chunking with the default chunk size of 300 tokens and 20% overlap. | ||
|
||
*** | ||
|
||
### FIXED\_SIZE | ||
|
||
> `readonly` `static` **FIXED\_SIZE**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Fixed Sized Chunking with the default chunk size of 300 tokens and 20% overlap. | ||
You can adjust these values based on your specific requirements using the | ||
`ChunkingStrategy.fixedSize(params)` method. | ||
|
||
*** | ||
|
||
### HIERARCHICAL\_COHERE | ||
|
||
> `readonly` `static` **HIERARCHICAL\_COHERE**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Hierarchical Chunking with the default for Cohere Models. | ||
- Overlap tokens: 30 | ||
- Max parent token size: 500 | ||
- Max child token size: 100 | ||
|
||
*** | ||
|
||
### HIERARCHICAL\_TITAN | ||
|
||
> `readonly` `static` **HIERARCHICAL\_TITAN**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Hierarchical Chunking with the default for Titan Models. | ||
- Overlap tokens: 60 | ||
- Max parent token size: 1500 | ||
- Max child token size: 300 | ||
|
||
*** | ||
|
||
### NONE | ||
|
||
> `readonly` `static` **NONE**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Amazon Bedrock treats each file as one chunk. Suitable for documents that | ||
are already pre-processed or text split. | ||
|
||
*** | ||
|
||
### SEMANTIC | ||
|
||
> `readonly` `static` **SEMANTIC**: [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Semantic Chunking with the default of bufferSize: 0, | ||
breakpointPercentileThreshold: 95, and maxTokens: 300. | ||
You can adjust these values based on your specific requirements using the | ||
`ChunkingStrategy.semantic(params)` method. | ||
|
||
## Methods | ||
|
||
### fixedSize() | ||
|
||
> `static` **fixedSize**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Method for customizing a fixed sized chunking strategy. | ||
|
||
#### Parameters | ||
|
||
• **props**: `FixedSizeChunkingConfigurationProperty` | ||
|
||
#### Returns | ||
|
||
[`ChunkingStrategy`](ChunkingStrategy.md) | ||
|
||
*** | ||
|
||
### hierarchical() | ||
|
||
> `static` **hierarchical**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Method for customizing a hierarchical chunking strategy. | ||
For custom chunking, the maximum token chunk size depends on the model. | ||
- Amazon Titan Text Embeddings: 8192 | ||
- Cohere Embed models: 512 | ||
|
||
#### Parameters | ||
|
||
• **props**: [`HierarchicalChunkingProps`](../interfaces/HierarchicalChunkingProps.md) | ||
|
||
#### Returns | ||
|
||
[`ChunkingStrategy`](ChunkingStrategy.md) | ||
|
||
*** | ||
|
||
### semantic() | ||
|
||
> `static` **semantic**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md) | ||
Method for customizing a semantic chunking strategy. | ||
For custom chunking, the maximum token chunk size depends on the model. | ||
- Amazon Titan Text Embeddings: 8192 | ||
- Cohere Embed models: 512 | ||
|
||
#### Parameters | ||
|
||
• **props**: `SemanticChunkingConfigurationProperty` | ||
|
||
#### Returns | ||
|
||
[`ChunkingStrategy`](ChunkingStrategy.md) |
Oops, something went wrong.