modify README and add run_dl.sh

UtrechtUniversity · Jun 26, 2024 · 14da727 · 14da727
1 parent 4d8d770
commit 14da727
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 7 deletions.
diff --git a/bioacoustics/feature_extraction/README.md b/bioacoustics/feature_extraction/README.md
@@ -1,13 +1,15 @@
 # Feature extraction
 
-The modules in this directory are used to extract acoustic and/or deep learning features from '.wav' files. The features are used as input for the classifier ([step 3](../3_classifier)).
+Feature extraction scripts are used to extract acoustic features from '.wav' files. 
+The output is two types of features that are input for the [classifiers](../classifier), i.e. svm and cnn.
 
 ## Instructions
 
-[Installation instructions](https://github.com/UtrechtUniversity/animal-sounds#getting-started)
+[Installation instructions](https://github.com/UtrechtUniversity/animal-sounds/tree/documenation_svm#getting-started)
+
+## Feature extraction for Support Vector Machines
+We extract several feature sets from using:
 
-### Feature extraction for Support Vector Machines
-We extract several feature sets using:
 - a [python version](https://github.com/mystlee/rasta_py) of the [rasta-mat](https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/) library.
 - an [Automatic Analysis Architecture](https://doi.org/10.5281/zenodo.1216028)
 
@@ -19,8 +21,8 @@ We extend the feature set with the features from an [Automatic Analysis Architec
 
 The script results in a feature set of 1140 features per audio frame.
 
-#### Running the script
-Use shell script `run.sh` to start `main.py` from the command line. The following arguments should be specified:
+### Running the script
+Use shell script `run_svm.sh` to start `extract_features_svm.py` from the command line. The following arguments should be specified:
 - `--input_dir`; directory where the '.wav' files are located.
 - `--output_dir`; directory where the feature files ('.csv') should be stored.
 - `--frame_length`; subdivide '.wav' files in frames of this length (in number of samples, if the sample rate is 48000 samples per second, choose e.g. 24000 for 0.5 second frames)
@@ -29,8 +31,35 @@ Use shell script `run.sh` to start `main.py` from the command line. The followin
 
 In `./config` the user can specify which features to extract.
 
-## sndfile library
+### sndfile library
 If you get an error saying something about a 'snd_file' dependency on an ubuntu machine, this can be fixed by installing the following C library:
 ```
 sudo apt-get install libsndfile-dev
 ```
+## Feature extraction for Convolutional Neural Network (CNN)
+To extract audio features for CNN classifier, .wav files are converted to Log-mel Spectrograms using [librosa](https://zenodo.org/badge/latestdoi/6309729) library. 
+Log- Melspectrograms had the best results in [[1]](#ref). As a future work we can try others such as Log-Spectrograms, and Gammatone-Spectrogram.
+
+In this process, first we apply butter_bandpass filter to select frequencies among 100, 2000 hz. Then the short time Fourier transform (STFT) is applied on time-domain waveforms to calculate spectrograms. 
+Then mel filter banks are applied on the spectrograms followed by a logarithmic operation to extract log mel spectrograms. 
+
+| <img src="../../img/melspectrogram.png" width="400" /> | 
+
+### Running the script
+Open a command line and run the following command:
+```
+sh run_dl.sh
+```
+
+This command applies `extract_features_dl.py` on the whole dataset. The following arguments should be specified:
+- `--input_dir`; directory where the '.wav' files are located.
+- `--output_dir`; directory where the feature files ('.pkl') should be stored.
+- `--label`; the label of the wav file, i.e. chimpanze or background
+- `--window_length`; subdivide '.wav' files in frames of this length (in number of samples, in our case, the sample rate is 48000 samples per second, we chose 750 for 15-millisecond frames)
+- `--hop_length`; overlap between frames in number of samples per hop (in our case, the sample rate is 48000 samples per second, we chose 376)
+- `--n_mel`; number of mel features, i.e. horizontal bars in spectrogram, which in our case it is 64.
+- `--new_img_size`; the number of rows and columns of the log-melspectrograms which is ingested as an image to cnn. In our case it is 64 * 64.
+
+## <a name="ref"></a>References
+1. K. Palanisamy,D. Singhania†, and A. Yao,"Rethinking CNN Models for Audio Classification",2020 
+   [arXiv preprint](https://arxiv.org/abs/2007.11154), [github](https://github.com/kamalesh0406/Audio-Classification)
diff --git a/bioacoustics/feature_extraction/run_dl.sh b/bioacoustics/feature_extraction/run_dl.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+
+DATADIR='/Volumes/science.data.uu.nl/research-zwerts/data/sanaga_test_chunks/'
+RECORDERS='A1 A3 A4 A5 A21 A22 A26 A38'
+
+OUTPUTDIR='../../output/features/'
+echo $DATADIR
+for RECORDER in $RECORDERS
+do
+  echo $DATADIR
+  echo $OUTPUTDIR
+  python3 extract_features_dl.py --input_dir $DATADIR'chimps/'$RECORDER'/*/*.wav' --output_dir $OUTPUTDIR$RECORDER'/'$RECORDER'_chimpanze.pkl' --label 'chimpanze' --window_length 750  --hop_length 376 --n_mel 64  --new_img_size 64 64
+  python3 extract_features_dl.py --input_dir $DATADIR'background/'$RECORDER'/*/*.wav' --output_dir $OUTPUTDIR$RECORDER'/'$RECORDER'_background.pkl' --label 'background' --window_length 750  --hop_length 376 --n_mel 64  --new_img_size 64 64
+done
+
+