v0.3.3: Attention attribution, new aggregation, improved saving/reloading and more
What’s Changed
Attention attribution (#148 )
This release introduces a new category of attention attribution methods and adds support for AttentionAttribution
(id: attention
). This method attributes the generated outputs using raw attention weights extracted during the forward pass, as it was done inter alia by Jain and Wallace, 2019. The parameters heads
and layers
enable the choice of a single element (single int
), a range (with a tuple (start_idx, end_idx)
) or a set of custom valid indices (as [idx_1, idx_2, ...]
) for attention heads and model layers respectively. The aggregation of multiple heads or layers can be performed using one of the default aggregators (e.g. max
, average
) or by defining a custom function and passing it to aggregate_heads_fn
or aggregate_layers_fn
in the call of model.attribute()
.
Example of default usage:
import inseq
model = inseq.load_model("facebook/wmt19-en-de", "attention")
out = model.attribute("The developer argued with the designer because her idea cannot be implemented.")
The default behavior is set to minimize unnecessary parameter definitions. In the default case above, the result is the average across all attention heads of the final layer.
Example of advanced usage:
import inseq
model = inseq.load_model("facebook/wmt19-en-de", "attention")
out = model.attribute(
"The developer argued with the designer because her idea cannot be implemented.",
layers=(0, 5),
heads=[0, 2, 5, 7],
aggregate_heads_fn = "max"
)
In the case above, the outcome is a matrix of maximum attention weights of heads 0, 2, 5 and 7 after averaging their weights across the first 5 layers of the model.
Other attention methods will be added in the upcoming releases (see summary issue #108 )
L2 + Normalize default aggregation (#157)
Starting from this release, the default aggregation adopted to aggregate attribution scores at a token level for GradientFeatureAttributionSequenceOutput
objects is the L2 norm of the tensor over the hidden_size
dimension, followed by a step-wise normalization of the attributions (all attributions across source and target at every generation step will sum to one). This replaces the previous normalization approach, which was a simple sum over the hidden dimension followed by a division by the norm of the step attribution vector. Importantly, since the L2 norm is guaranteed to be a positive value, the resulting attribution scores will now always be positive (also for integrated_gradients
).
Motivations:
- Good empirical faithfulness of such aggregation procedure on transformer-based models shown by Bastings et al. 2022
- Improved understanding of the individual contribution of every input to the generation of the output by means of positivity and normalization.
Improved saving and reloading of attributions (#157)
When saving attribution outputs, now it is possible to obtain one file per sequence by specifying split_sequences=True
, and to automatically zip the generated outputs with compress=True
.
import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-it", "saliency")
out = model.attribute(["sequence one", "sequence number two"])
# Creates out_0.json.gz, out_1.json.gz
out.save("out.json.gz", split_sequences=True, compress=True)
Export attributions for usage with pandas
(#157)
The new method FeatureAttributionOutput.get_scores_dicts
allows to export source_attributions
, target_attributions
and step_scores
as dictionaries that can be easily loaded into pd.DataFrame
objects for further analysis (thanks @MoritzLaurer for raising the issue!). Example usage:
import inseq
import pandas as pd
model = inseq.load_model("Helsinki-NLP/opus-mt-en-it", "saliency")
out = model.attribute(
["Hello ladies and badgers!", "This is a test input"], attribute_target=True, step_scores=["probability", "entropy"]
)
# A list of dataframes (one per sequence) corresponding to source matrices in out.show
dfs = [pd.DataFrame(x["source_attributions"]) for x in out.get_scores_dicts()]
# A list of dataframes (one per sequence) corresponding to target matrices in out.show
dfs = [pd.DataFrame(x["target_attributions"]) for x in out.get_scores_dicts()]
# A list of dataframes (one per sequence) with step scores ids as rows and generated target tokens as columns
dfs = [pd.DataFrame(x["step_scores"]) for x in out.get_scores_dicts()]
ruff for style and quality checks (#159)
From this release Inseq drops flake8
, isort
, pylint
and pyupgrade
linting and moves to ruff
with the corresponding extensions for style and quality checks. This allows to dramatically speed up build
checks (from ~4 minutes to <1 second). Library developers are advised to integrate ruff
in their automatic checks during coding (VSCode extension and Pycharm plugin are available)
All Merged PRs
🚀 Features
ruff
stylechecking (#159) @gsarti- Minor fixes to 0.3.2 (#157) @gsarti
- Basic Attention attribution (#148) @lsickert
🔧 Fixes & Refactoring
- Fix build badge (#152) @gsarti
ruff
stylechecking (#159) @gsarti- Minor fixes to 0.3.2 (#157) @gsarti
- Fix conflicting generation args (#155) @gsarti
- Fix issues with pytorch 1.13 on MacOs (#151) @lsickert