Skip to content

Commit

Permalink
joss rev 5
Browse files Browse the repository at this point in the history
  • Loading branch information
enricgrau committed Oct 31, 2023
1 parent 8c4c381 commit c64ed30
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@ Even though there are methods and libraries available for explaining different t

**pudu** is a Python library that helps to make sense of ML models for spectroscopic data by quantifying changes in spectral features and explaining their effect to the target instances. In other words, it perturbates the features in a predictable and deliberate way and evaluates the features based on how the final prediction changes. For this, four main methods are included and defined. **Importance** quantifies the relevance of the features according to the changes in the prediction. Thus, this is measured in probability or target value difference for classification or regression problems, respectively. **Speed** quantifies how fast a prediction changes according to perturbations in the features. For this, the Importance is calculated at different perturbation levels, and a line is fitted to the obtained values and the slope, or the rate of change of Importance, is extracted as the Speed. **Synergy** indicates how features complement each other in terms of prediction change after perturbations. Finally, **Re-activations** account for the number of unit activations in a Convolutional Neural Network (CNN) that after perturbation, the value goes above the original activation criteria. The latter is only applicable for CNNs, but the rest can be applied to any other ML problem, including CNNs. To read in more detail how these techniques work, please refer to the [definitions](https://pudu-py.github.io/pudu/definitions.html) in the documentation.

pudu is versatile as it can analyze classification and regression algorithms for both 1- and 2-dimensional problems, offering plenty of flexibility with parameters, , and the ability to provide localized explanations by selecting specific areas of interest. To illustrate this, Figure 1 shows two analysis instances using the same `importance` method but with different parameters. Additionally, its other functionalities are shown in examples using scikit-learn [@Pedregosa2011], keras [@chollet2018keras], and localreg [@Marholm2022] are found in the documentation, along with XAI methods including LIME and GradCAM.
pudu is versatile as it can analyze classification and regression algorithms for both 1- and 2-dimensional problems, offering plenty of flexibility with parameters, , and the ability to provide localized explanations by selecting specific areas of interest. To illustrate this, \autoref{fig:figure1} shows two analysis instances using the same `importance` method but with different parameters. Additionally, its other functionalities are shown in examples using scikit-learn [@Pedregosa2011], keras [@chollet2018keras], and localreg [@Marholm2022] are found in the documentation, along with XAI methods including LIME and GradCAM.

**pudu** is built in Python 3 [@VanRossum2009] and uses third-party packages including numpy [@Harris2020], matplotlib [@Caswell2021], and keras. It is available in both PyPI and conda, and comes with complete documentation, including quick start, examples, and contribution guidelines. Source code and documentation are available in https://github.com/pudu-py/pudu.

![Two ways of using the same method 'importance' by A) using a sequential change pattern over all the spectral features and B) selecting peaks of interest. In A), the impact of the peak in the range of 1200-1400 opaques the impact of the rest. In contrast, in B) only the first four main peaks are selected to be analyzed and better visualize their impact in the prediction.](figure1.png)
![Two ways of using the same method *importance* by A) using a sequential change pattern over all the spectral features and B) selecting peaks of interest. In A), the impact of the peak in the range of 1200-1400 opaques the impact of the rest. In contrast, in B) only the first four main peaks are selected to be analyzed and better visualize their impact in the prediction.\label{fig:figure1}](figure1.png)


# Acknowledgements
Expand Down

0 comments on commit c64ed30

Please sign in to comment.