Releases: Aleph-Alpha/intelligence-layer-sdk
Releases · Aleph-Alpha/intelligence-layer-sdk
v0.4.1
Fix missing version bump in the packages
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Breaking Changes
Evaluator
methods changed to support asynchronous processing for human eval. To run everything at once, changeevaluator.evaluate()
calls toevaluator.run_and_evaluate
- An evaluation also now returns a
EvaluationOverview
, with much more information about the output of the evaluation.
- An evaluation also now returns a
EmbeddingBasedClassify
: init arguments swapped places, fromlabels_with_examples, client
toclient, label_with_examples
PromptOutput
forInstruct
tasks now inherits fromCompleteOutput
to make it easier to use more information about the raw completion response.
New Features
- New
IntelligenceApp
builder to quickly spin up a FastAPI server with yourTask
s - Integration with Argilla for human evaluation
CompleteOutput
andPromptOutput
now support getting thegenerated_tokens
in the completion for downstream calculations.- Summarization use cases now allow for overriding the default model
- New
RecursiveSummarizer
allows for recursively calling one of theLongContextSummarize
tasks until certain thresholds are reached
Fixes
LimitedConcurrencyClient
'sfrom_token
method now supports a custom API host
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Breaking Changes
Dataset
is now a protocol.SequenceDataset
replaces the oldDataset
.- The
ident
attribute onExample
is nowid
. calculate_bleu
function is removed and instead called from aBleuGrader
calculate_rouge
function is removed and instead called from aRougeGrader
ClassifyEvaluator
is now calledSingleLabelClassifyEvaluator
Evaluator
s now take and returnIterator
s instead ofSequence
s to allow for streaming datasets #106 #108
New Features
Evaluators
now have better handling of dataset processing.- Errors are handled for individual examples, so that you don't lose the entire run because of one failed task.
- The dataset run now produces an
EvaluationRunOverview
generated by anEvaluationRepository
, that better captures the aggregated runs and traces. #109 #112 #115 #131 - There is a
FileEvaluationRepository
and anInMemoryEvaluationRepository
available for storing your evaluation results
- Support passing
Metadata
field throughDocumentIndexClient
(already supported in the Document Index, new in client only) #105 - New
MultiLabelClassifyEvaluator
to evaluate classification use cases that support multi-label classification #129 #133 Evaluators
can now be called via the CLI #130
Fixes
- Fix issue in
EchoTask
regarding concurrent execution causing overrides in thePromptTemplate
#116
Full Changelog: v0.2.0...v0.3.0
v0.2.0
Breaking Changes
SingleLabelClassify
renamed toPromptBasedClassify
with newSingleLabelClassifyOutput
in #94 #96EmbeddingBasedClassify
now outputsMultiLabelClassifyOutput
to distinguish between the different types of scores produced in #94 #96
New Features
- New
LimitedConcurrencyClient
to better control how many simultaneous API requests are made concurrently, regardless of where they are called within the Task hierarchy - Basic new
SingleChunkSummarizeEvaluator
andLongContextSummarizeEvaluator
that can calculate Rouge and Bleu scores when compared with a "golden summary" in #90 #91
Fixes
- Fix issue with Pydantic 2.5 due to ambiguous ordering of types in
PydanticSerializable
type in #95 - Fixed possible deadlock with nested calls to
Task.run_concurrently
in #99 - Allow
EchoTask
to support models whose tokenizers don't contain pre_tokenizers in #98 - Update documentation for including the package in Dockerfiles in #97
Full Changelog: v0.1.0...v0.2.0
v0.1.0 - Initial Release
Initial Beta Release