Somnium is a library intended for providing a easy and powerful way of exploring multi-dimensional data sets. It uses the Self-Organising Map algorithm (aka Kohonen map).
A Self-Organising Map (SOM hereafter), is a biologically inspired algorithm meant for exploring multi-dimensional non-linear relations between variables. SOM was proposed in 1984 by Teuvo Kohonen, a Finnish academician. It is based in the process of task clustering that occurs in our brain and it is considered a type of neural network. It compresses the information of high-dimensional data into geometric relationships onto a low-dimensional representation.
Here are just a few of the applications of SOM algorithm.
- Discover, at a glance, the non-linear relations between the variables of a dataset.
- Micro-segment the instances of a dataset in an easily and visually understandable way.
- Work as a surrogate-model of a black-box model for explainability purpose.
- Assistant for feature selection/reduction by finding linearly and non-linearly correlated variables.
Somnium requires:
- Python (>=3.5)
- NumPy (>= 1.7)
- SciPy (>= 1.1)
- scikit-learn (>= 0.20)
For now, the only supported installation method is through setuptools
:
- Clone the repository pasting the following code in your terminal.
git clone https://github.com/ivallesp/somnium
- Move your current directory to the main folder in the repository:
cd somnium
- Install the package using python:
python setup.py install
The API is currently being developed, which means that it is going to change from time to time. However the master
branch of this repository will always be fully functional. In the future, I plan to write some docs about the library, but for now you can find at least one example of usage in the examples
folder.
- Integrate the visualization into the SOM API.
- Enhance the visualization API with more OOP patterns.
- Write a documentation page
- Integrate with Travis.
- Work on other installation methods.
- Write at least one example for each application.
- Research for and implement algorithm enhancements.
- Enhance reproducibility. Start by setting a seed.
- Refactor plugins for always returning the figure, no `plt.show()
- The current visualization engine only runs well under
jupyter notebooks
. If you run it from a python or ipython console the figures will not look well. - Wider than higher maps (e.g.
mapsize=[10, 15]
) are not shown correctly.
All contributions are welcome and appreciated. I don't have time to finish it soon so, please, feel free to open an issue to either propose some contribution or discuss potential new functionalities. All the contributions should be made through a pull request.
This library has been built using SOMPY library as a starting point, and that is why you may find some similarities in the code.
This library has been licensed under MIT agreement. Please refer to the LICENSE
file on the root of this repository. Copyright (c) 2019 Iván Vallés Pérez