Releases: Aleph-Alpha/intelligence-layer-sdk
v0.11.0
Breaking Changes
- breaking_change:
HuggingFaceDatasetRepository
now has a parametercaching
, which caches a examples of a dataset once loaded. This isTrue
by default. This drastically reduces network traffic. For a non-breaking change, set it toFalse
. - breaking_change:
MultipleChunkRetrieverQa
does not takeinsert_chunk_size
-parameter but instead takesExpandChunks
-task - breaking_change: the
issue_cassification_user_journey
notebook moved to its own repository
New Features
- feature:
Llama2InstructModel
to support llama-2 models in Aleph Alpha API - feature:
Llama3InstructModel
to support llama-3 models in Aleph Alpha API - feature:
ExpandChunks
-task caches chunked documents by ID - feature:
DocumentIndexClient
now supports
-create_index
-index_configuration
-assign_index_to_collection
-delete_index_from_collection
-list_assigned_index_names
- feature:
DocumentIndexRetriever
now supportsindex_name
- feature:
Runner.run_dataset
now has a configurable number of workers viamax_workers
and defaults to the previous value, which is 10. - feature: In case a
BusyError
is raised during acomplete
theLimitedConcurrencyClient
will retry untilmax_retry_time
is reached. - feature:
FileTracer
now accepts aslog_file_path
both, astr
and aPath
Fixes
- refactor: rename
index
parameter inDocumentIndex.search()
toindex_name
- fix:
HuggingFaceRepository
no longer is a dataset repository. This also means thatHuggingFaceAggregationRepository
no longer is a dataset repository.
Full Changelog: v0.10.0...v0.11.0
v0.10.0
Breaking Changes
- breaking change:
ExpandChunksOutput
now returnsChunkWithStartEndIndices
instead ofTextChunk
- breaking change:
MultipleChunkRetrieverQa
'sAnswerSource
now containsEnrichedChunk
instead of just theTextChunk
New Features
Fixes
- fix:
ChunkWithIndices
now additionally returns end_index - fix:
DocumentPath
andCollectionPath
are now immutable
v0.9.1
Breaking Changes
- breaking change:
MultipleChunkRetrieverQaOutput
now returnsources
andsearch_results
New Features
- feature:
ExpandChunks
task takes a retriever and some search results to expand the chunks to the desired length
Fixes
- fix:
ExpectedSearchOutput
has only relevant fields and supports generic document-ID
rather than just str - fix:
SearchEvaluationLogic
explicitly compares documents by ids - fix: In
RecusrsiveSummarize.do_run
,num_generated_tokens
not uninitialized anymore. See Issue 743.. - fix: Reverted pydantic to 2.6.* because of FastAPI incompatibility.
Full Changelog: v0.9.0...v0.9.1
v0.9.0
Breaking Changes
- breaking change: Renamed the field
chunk
ofAnswerSource
tosearch_result
for multi chunk retriever qa. - breaking change: The implementation of the HuggingFace repository creation and deletion got moved to
HuggingFaceRepository
New Features
- feature: HuggingFaceDataset- & AggregationRepositories now have an explicit
create_repository
function. - feature: Add
MultipleChunkRetrieverBasedQa
, a task that performs better on faster on retriever-QA, especially with longer context models
Full Changelog: v0.8.2...v0.9.0
v0.8.2
0.8.2
New Features
- feature: Add
SearchEvaluationLogic
andSearchAggregationLogic
to evaluateSearch
-use-cases - feature: Trace viewer and IL python package are now deployed to artifactory
Fixes
- Documentation
- fix: Add missing link to
issue_classification_user_journey
notebook to the tutorials section of README. - fix: Confusion matrix in
issue_classification_user_journey
now have rounded numbers.
- fix: Add missing link to
Full Changelog: v0.8.1...v0.8.2
v0.8.1
v0.8.0
What's Changed
New Features
-
feature: Expose start and end index in DocumentChunk
-
feature: Add sorted_scores property to
SingleLabelClassifyOutput
. -
feature: Error information is printed to the console on failed runs and evaluations.
-
feature: The stack trace of a failed run/evaluation is included in the
FailedExampleRun
/FailedExampleEvaluation
object -
feature: The
Runner.run_dataset(..)
andEvaluator.evaluate_run(..)
have an optional flagabort_on_error
to stop running/evaluating when an error occurs. -
feature: Added
Runner.failed_runs(..)
andEvaluator.failed_evaluations(..)
to retrieve all failed run / evaluation lineages -
feature: Added
.successful_example_outputs(..)
and.failed_example_outputs(..)
toRunRepository
to match the evaluation repository -
feature: Added optional argument to set an id when creating a
Dataset
viaDatasetRepository.create_dataset(..)
-
feature: Traces now log exceptions using the
ErrorValue
type. -
Documentation:
- feature: Add info on how to run tests in VSCode
- feature: Add
issue_classification_user_journey
notebook. - feature: Add documentation of newly added data retrieval methods
how_to_retrieve_data_for_analysis
- feature: Add documentation of release workflow
Fixes
- fix: Fix version number in pyproject.toml in IL
- fix: Fix instructions for installing IL via pip.
Full Changelog: v0.7.0...v0.8.0
v0.7.0
Overview
- Refactoring in Evaluation
- Many changes to Evaluation repository structure and renaming to make the overall handling more intuitive and consistent,
- New How-To’s and improved documentation
- Simplified repository access via data selection methods
- Better text highlighting
- Better tracer viewer integration
- Displaying
InMemoryTracer
objects in a jupyter notebook will load them into an active trace viewer.
- Displaying
Breaking Changes
- breaking change:
FScores
are now correctly exposed asFScores
and no longer asRougeScores
- breaking change:
HuggingFaceAggregationRepository
andHuggingFaceDatasetRepository
now consistently follow the same folder structure asFileDatasetRepository
when creating data sets. This means that datasets will be stored in a folder datasets and additional sub-folders named according to the respective dataset ID. - breaking change: Split
run_repository
intofile_run_repository
,in_memory_run_repository
. - breaking change: Split
evaluation_repository
intoargilla_evaluation_repository
,file_evaluation_repository
andin_memory_evaluation_repository
- breaking change: Split
dataset_repository
into file_dataset_repository andin_memory_dataset_respository
- breaking change: Split
aggregation_respository
intofile_aggragation_repository
andin_memory_aggregation_repository
- breaking change: Renamed
evaluation/run.py
toevaluation/run_evaluator.py
- breaking change: Split
evaluation/domain
and distribute it across aggregation, evaluation, dataset and run packages. - breaking change: Split
evaluation/argilla
and distribute it across aggregation and evaluation packages. - breaking change: Split evaluation into separate dataset, run, evaluation and aggregation packages.
- breaking change: Split
evaluation/hugging_face.py
into dataset and aggregation repository files indata_storage
package. - breaking change:
create_dataset
now returns the newDataset
type instead of a dataset ID. - breaking change: Consistent naming for repository root directories when creating evaluations or aggregations: .../eval → .../evaluations and .../aggregation → aggregations.
- breaking change: Core tasks not longer provide defaults for the applied models.
- breaking change: Methods returning entities from repositories now return the results ordered by their IDs.
- breaking change: Renamed crashed_during_eval_count to crashed_during_evaluation_count in AggregationOverview.
- breaking change: Renamed
create_evaluation_dataset
toinitialize_evaluation
inEvaluationRepository
. - breaking change: Renamed
to_explanation_response
toto_explanation_request
inExplainInput
. - breaking change: Removed
TextHighlight::text
in favor of TextHighlight::start andTextHighlight::end
- breaking change: Removed
IntelligenceApp
andIntelligenceStarterApp
- breaking change:
RetrieverBasedQa
uses nowMultiChunkQa
instead of generic taskSingleChunkQa
- breaking change:
EvaluationRepository::failed_example_evaluations
no longer abstract - breaking change:
- Elo calculation simplified:
Payoff
from elo package has been removed PayoffMatrix
from elo package renamed toMatchOutcome
SingleChunkQa
uses logit_bias to promote not answering for German
- Elo calculation simplified:
- breaking change: Remove
ChunkOverlap
task. - breaking change: Rename
Chunk
toTextChunk
. - breaking change: Rename
ChunkTask
toChunk
. - breaking change: Rename
EchoTask
toEcho
. - breaking change: Rename
TextHighlightTask
toTextHighlight
- breaking change: Rename
ChunkOverlaptTask
toChunkOverlap
New Features
Aggregation:
- feature:
InstructComparisonArgillaAggregationLogic
uses full evaluation set instead of sample for aggregation
Documentation
- feature: Added How-To’s (linked in the README):
- how to define a task
- how to implement a task
- how to create a dataset
- how to run a task on a dataset
- how to perform aggregation
- how to evaluate runs
- feature: Restructured and cleaned up README for more conciseness.
- feature: Add illustrations to Concepts.md.
- feature: Added tutorial for adding task to a FastAPI app (linked in README).
- feature: Improved and added various DocStrings.
- feature: Added a README section about the client URL.
- feature: Add python naming convention to README
Classify
- feature:
PromptBasedClassify
now supports changing of the prompt instruction via the instruction parameter. - feature: Add default model for
PromptBasedClassify
- feature: Add default task for
PromptBasedesClassify
Evaluation
- feature: All repositories will return a
ValueError
when trying to access a dataset that does not exist while also trying to access an entry of the dataset. If only the dataset is retrieved, it will return None. - feature:
ArgillaEvaluationRepository
now handles failed evaluations. - feature: Added
SingleHuggingfaceDatasetRepository
. - feature: Added
HighlightCoverageGrader
. - feature: Added
LanguageMatchesGrader
. - feature: Added prettier default printing behavior of repository entities by providing overloads to
__str__
and__repr__
methods. - feature: Added abstract
HuggingFace
repository base-class. - feature: Refactoring of
HuggingFace
repository - feature: Added
HuggingFaceAggregationRepository
. - feature: Added template method to individual repository
- feature: Added Dataset model to dataset repository. This allows to store a short descriptive name for the dataset for easier identification.
- feature:
SingleChunkQa
internally now uses the same model inTextHighlight
by default. - feature:
MeanAccumulator
tracks standard deviation and standard error. - feature:
EloCalculator
now updates ranking after each match. - feature: Add data selection methods to repositories:
AggregationRepository::aggregation_overviews
EvaluationRepository::run_overviews
EvaluationRepository::run_overview_ids
EvaluationRepository::example_output
EvaluationRepository::example_outputs
EvaluationRepository::example_output_ids
EvaluationRepository::example_trace
EvaluationRepository::example_tracer
RunRepository::run_overviews
RunRepository::run_overview_ids
RunRepository::example_output
RunRepository::example_outputs
RunRepository::example_output_ids
RunRepository::example_trace
RunRepository::example_tracer
- feature:
Evaluator
continues in case of no successful outputs
Q & A
- feature: Define default parameters for
LongContextQa
,SingleChunkQa
- feature: Define default task for
RetrieverBasedQa
- feature: Define default model for
KeyWordExtract
,MultiChunkQa
- feature: Improved focus of highlights in
TextHighlight
tasks. - feature: Added filtering for
TextHighlight
tasks. - feature: Introduce
logit_bias
toSingleChunkQa
Summarize
- feature: Added
RecursiveSummarizeInput
. - feature: Define defaults for
SteerableSingleChunkSummarize
,SteerableLongContexSummarize
,RecursiveSummarize
Tracer
- feature: Added better trace viewer integration:
- Added trace storage to trace viewer server
- Added submit_to_tracer_viewer method to
InMemoryTracer
- UI and navigation improvements for trace viewer
- Add exception handling for tracers during log entry writing
Others
- feature: The following classes are now exposed:
DocumentChunk
MultipleChunkQaOutput
Subanswer
- feature: Simplified internal imports.
- feature: Stream lining of
__init__
-parameters of all tasks- Sub-tasks are typically exposed as
__init__
-parameters with sensible defaults. - Defaults for non-trivial parameters like models or tasks are defined in
__init__
while the default parameter is None. - Instead of exposing parameters that are passed on to sub-tasks the sub-task themselves are exposed.
- Sub-tasks are typically exposed as
- feature: Update supported models
Fixes
- fix: Fixed exception handling in language detection of
LanguageMatchesGrader
. - fix: Fixed a bug that could lead to cut-off highlight ranges in
TextHighlight
tasks. - fix: Fixed
list_ids
methods to usepath_to_str
- fix: Disallow traces without end in the trace viewer
- fix:
ArgillaClient
now correctly uses provided API-URL instead of hard-coded localhost
Full Changelog: v0.6.0...v0.7.0
v0.6.0
Breaking Changes
- breaking change: The evaluation module is moved from core to evaluation.
- breaking change: RetrieverBasedQa task answers now contain document ids in each subanswer.
- breaking change: LongcontextSummarize no longer supports the max_loops parameter.
- breaking change: Rich Model Representation
- The LLM-based tasks no longer accept client, but rather an AlephAlphaModel, which holds the client. The available model classes are AlephAlphaModel and LuminousControlModel.
- The AlephAlphaModel is responsible for its prompt format, tokenizers, complete task and explain task. These responsibilities were moved into the model classes.
- The default client url is now configurable via the environment variable CLIENT_URL.
- breaking change: PromptWithMetadata is removed in favor of RichPrompt . The semantics remain largely unchanged.
- breaking change: The compression-dependent long context summarize classes as well as the few-shot summarize class were removed. Use the better-performing steerable summary classes.
- breaking change: Runner, Evaluator & Aggregation
- The EvaluationRepository has been split up. There is now a total of four repositories: dataset , run, evaluation and aggregation. These repositories save information from their respective steps
- The evaluation and evaluation aggregation have been split and are now provided by the classes Evaluator and Aggregator, respectively. These two classes have no abstract methods. The evaluation and aggregation logic is provided by implementing the abstract methods of the classes EvaluationLogic and AggregationLogic which are passed on to an instance of the Evaluator and Aggregator class, respectively.
New Features
- Documentation
- feature: Added an intro to the Intelligence Layer concepts in Concepts.md.
- feature: Added documentation on how to execute tasks in parallel. See the performance_tips notebook for more information.
- QA
- feature: RetrieverBasedQa task no longer sources its final from all sources, but only the most relevant. This performed better in evaluation.
- feature: The notebooks for RetrieverBasedQa have been updated to use SingleChunkQa.
- feature: SingleChunkQa now supports a custom no-answer phrase.
- feature: MultiChunkQA and LongContextQa allow for more configuration of the used qa-task.
- feature: Make the distance metric configurable in QdrantInMemoryRetriever.
- features: Added list_namespaces to DocumentIndexClient to list all available namespaces in DocumentIndex.
- Evaluation
- feature: The argilla now supports splitting a dataset for multiple people via the split_dataset function.
- feature: Utilities for ELO score/ranking calculation
- The build_tournaments utility function has been added to facilitate the computation of ELO scores when evaluating two models. See InstructComparisonArgillaEvaluator for an example how it can be used to compute the ELO scores.
- feature: The Evaluator can run multiple evaluation tasks in parallel.
- Intelligence app
- feature: IntelligenceApp returns 204 if the output is None
- feature: Allow registering tasks with a task dependency in IntelligenceApp.
- Others
- feature: Runner accepts in run_dataset a new parameter num_examples specifying how many of the first n examples should be run.
- feature: Support None as return type in Task
- feature: Added a new task: ChunkOverlapTask splits a longer text into overlapping chunks.
Full Changelog: v0.5.1...v0.6.0
v0.5.1
Fix failed tag
Full Changelog: v0.5.0...v0.5.1