-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
33 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,68 +1,64 @@ | ||
# Explainable and Interpretable Machine Learning | ||
|
||
Explainable Artificial Intelligence (XAI) is an emerging field trying to make AI sytems more understandable to humans. The goal of XAI according to DARPA[@gunning2017explainable], is to “produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and enable human users to understand, appropriately, trust, and effectively manage the emerging generation of artificially intelligent partners”. Especially, on deep networks in high-stake sceanrios understanding the decison process or the decision is crucial to prevent harm (e.g., autonmous driving or anything health related). | ||
Explainable Artificial Intelligence (XAI) is an emerging field trying to make AI systems more understandable to humans. The goal of XAI, according to DARPA[@gunning2017explainable], is to “produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and enable human users to understand, appropriately, trust, and effectively manage the emerging generation of artificially intelligent partners”. Especially on deep networks in high-stake scenarios understanding the decision process or the decision is crucial to prevent harm (e.g., autonomous driving or anything health-related). | ||
|
||
Multiple terms are strongly related to XAI. Most famously "Explainability" and "Interpretability". Both terms are often used interchangably with no consent on definitions existing in literature. | ||
Multiple terms are strongly related to XAI. Most famously "Explainability" and "Interpretability". Both terms are often used interchangeably with no consent on definitions existing the literature. | ||
|
||
"Interpretability" in the context of TSInterpret refers to the ability to support user understanding and comprehension of the model decision making process and predictions. Used to provide user undestanding of model decison "Explainability" algorithms are used. Thereby, it is is often the case that multiple explainability algorithms are necessary for the user to understand the decision process. | ||
"Interpretability" in the context of TSInterpret refers to the ability to support user understanding and comprehension of the model decision-making process and predictions. Used to provide user uunderstanding of model decisions, "Explainability" algorithms are used. Thereby, it is often the case that multiple explainability algorithms are necessary for the user to understand the decision process. | ||
|
||
"Explainability" tries provide algorithms that give insights into model predictions. | ||
"Explainability" tries to provide algorithms that give insights into model predictions: | ||
|
||
- How does a prediction change dependent on feature inputs? | ||
|
||
- What features are or are not important for a given prediction to hold? | ||
- What features are or are not important for a given prediction? | ||
|
||
- What set of features would you have to minimally change to obtain a new prediction of your choosing? | ||
- What features would you have to change minimally to obtain a new prediction of your choosing? | ||
|
||
- How does each feature contribute to a model’s prediction? | ||
|
||
Interpretability is the end-goal, explanations and explainability are tools to reach interpretability [@honegger2018shedding]. | ||
Interpretability is the end goal. Explanations and explainability are tools to reach interpretability [@honegger2018shedding]. | ||
|
||
TSInterpret provides a set of algorithms or methods known as explainers specifccaly for time series. Each explainer provides different kind of insight about a model (- i.e., answers different types of questions). The set of algorithms available to a specific model is dependent on a number of factors. For instance, some approaches need a gradient to funtion and can therefore only be applied to models providing such. A full listing can be found in the section Algorithm Overview. | ||
TSInterpret provides a set of algorithms or methods known as explainers specifically for time series. Each explainer provides a different kind of insight about a model (- i.e., answers different questions). The set of algorithms available to a specific model depends on several factors. For instance, some approaches need a gradient to function and can only be applied to models providing such. A full listing can be found in the section Algorithm Overview. | ||
|
||
|
||
## Application | ||
As machine learning methods have become more complex and more mainstream, with many industries now incorporating AI in some form or another, the need to understand the decisions made by models is only increasing. Explainability has several applications of importance. | ||
As machine learning methods have become increasingly complex, with many practitioners applying machine and, specifically, deep learning methods, the need to understand the decisions made by models is only increasing. | ||
|
||
Trust: At a core level, explainability builds trust in the machine learning systems we use. It allows us to justify their use in many contexts where an understanding of the basis of the decision is paramount. This is a common issue within machine learning in medicine, where acting on a model prediction may require expensive or risky procedures to be carried out. | ||
**Trust**: Explanations can build trust in the machine learning systems and increase social acceptance by providing insights into the basis of a decision. | ||
|
||
Testing: Explainability might be used to audit financial models that aid decisions about whether to grant customer loans. By computing the attribution of each feature towards the prediction the model makes, organisations can check that they are consistent with human decision-making. Similarly, explainability applied to a model trained on image data can explicitly show the model’s focus when making decisions, aiding debugging. Practitioners must be wary of misuse, however. | ||
**Debugging and Auditing**: An explanation for an erroneous prediction helps to understand the cause of the error (e.g., by showing the model focus) and delivers a direction for how to fix the system. Further, by computing feature attributions toward a model's prediction, users can check whether those attributions are consistent with their understanding. | ||
|
||
Functionality: Insights can be used to augment model functionality. For instance, providing information on top of model predictions such as how to change model inputs to obtain desired outputs. | ||
|
||
Research: Explainability allows researchers to understand how and why models make decisions. This can help them understand more broadly the effects of the particular model or training schema they’re using. | ||
**Research**: Explainability allows us to understand how and why models make decisions, thereby helping to understand the effects of the particular model or training schema. | ||
|
||
|
||
|
||
## Taxonomy | ||
Explanations Methods and Techneiques for Model Interpretability can be classified according to different criterias. In this section we only introduce the criterias most relevant to TSInterpret. | ||
Explanations Methods and Techniques for Model Interpretability can be classified according to different criteria. In this section, we only introduce the criterias most relevant to TSInterpret. | ||
|
||
### Post-Hoc vs Instrinct | ||
|
||
Instrinct Interpretability refers to models that are interpretable by design. This can be achieved by constraing model complexity or inclusion of explanation components into the model design. | ||
|
||
Instrinct Interpretability refers to models that are interpretable by design. This can be achieved by constraining model complexity or including explanation components in the model design. | ||
Post-Hoc Interpretability refers to explanation methods applied after model training and are usually decoupled from the model. | ||
|
||
TSInterpret focuses on Post-Hoc Interpretability. | ||
|
||
### Model-Specific vs Model-Agnostic | ||
|
||
Model-Specific methods are limited to specific model classes and usually rely on specific model internal (e.g., Gradients). | ||
Model-Agnostic methods can be applied to any model and rely on analyzing the connection between inputs and output. Those mehtods cannot access the model internal functions. | ||
Model-Specific methods are limited to specific model classes and usually rely on a specific model internal (e.g., Gradients). | ||
Model-Agnostic methods can be applied to any model and rely on analyzing the connection between inputs and output. Those methods cannot access the model's internal functions. | ||
|
||
### Results of Explanation Methods | ||
|
||
- TODO | ||
- TODO | ||
- TODO | ||
|
||
#TODO Input Architecture | ||
- **Feature Attribution methods (FA)** return a per-feature attribution score based on the feature’s contribution to the model’s output | ||
- **Instance-based methods (IB)** calculate a subset of relevant features that must be present to retain or remove a change in the prediction of a given mode | ||
- **Surrogate Models** train a simpler interpretable model. | ||
|
||
## Simple Example | ||
<img src="img/Post-Hoc.png" height=300 width=300 /> | ||
Take for example a Decision Support System to classify heart rates as depicted in the figure below. While the data scientist knows that the machine learning model is able to obtain an accuracy of over 90 % to classify a heart rate as abnormal or normal, the decision process of such a system is still intransprent resulting in unsureness about the decision process of a model. To make this decision process more opaque a data scientist might decide to use algorithms for explainable and interpretable machine learning, to learn a) which features are important, b) which feature influence the decision of a model in a postive or negativ way?, c) how a counter example would look like ?. | ||
|
||
|
||
<p align="center"><img src="../img/Post-Hoc.png" height=700 width=700 /></p> | ||
Take, for example, a Decision Support System to classify heart rates as depicted in the figure below. While the data scientist knows that the machine learning model can obtain an accuracy of over 90 % to classify a heart rate as abnormal or normal, the decision process of such a system is still intransparent resulting in unsureness about the decision process of a model. Wrong Classifications in both directions can have long-lasting effects on a patient relying on the system. If a heart rate is wrongly classified as normal, a patient will not get the necessary medication. If a heart rate is wrongly classified as Abnormal, a patient would get medication and endure side effects although his heart might still be health | ||
To make this decision process more opaque, a data scientist might decide to use algorithms for explainable and interpretable machine learning, to learn: | ||
1. Which features are important. | ||
2. Which feature influences the decision of a model positively or negatively. | ||
3. How a counter-example would look like. | ||
|
||
#TODO Input from Mails | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters