Welcome to the Deep Learning repository by Scramjet. This repository hosts a collection of resources, tutorials, and code samples related to Tensorflow Sequential model being run as a Sequence.
Using Scramjet's Transform Hub (STH) offers a streamlined solution for converting audio files into spectrograms using Tensorflow generators and subsequently training a Convolutional Neural Network (CNN) Sequential model on the generated spectrogram data packed in a Scramjet Sequence.
- Audio to Spectrogram Conversion: Utilize powerful Python generators to efficiently transform audio files into spectrogram images.
- CNN Sequential Model Training: Train a Convolutional Neural Network (CNN) Sequential model on the generated spectrograms for classification, regression, or any other suitable task.
- Efficient Data Handling: Handle large audio datasets effectively using generators, ensuring efficient memory usage and seamless training.
- STH
- AWS S3 credentials
- Python (>=3.6)
Install the Scramjet Transform Hub (STH) locally or use Scramjet's Cloud Platform environment for the Sequence deployment. For more information on the below commands check the CLI reference section on Scramjet's Website.
This directory demonstrates how to leverage deep learning techniques to recognize audio commands using Tensorflow and Scramjet's STH framework.
Dataset Core words: 'down', 'go', 'left', 'no', 'off', 'on', 'right', 'stop', 'up', 'yes'
Dataset source: https://developer.ibm.com/exchanges/data/all/speech-commands/
Audio format specification: PCM_S16LE Mono 16000Hz
- For this Sequence to run properly on your Linux machine use the following command to start the STH on terminal #1.
$ DEVELOPMENT=true sth --runtime-adapter=process
NOTE: This Sequence might consume some disk space, clearing out Scramjet's Sequence disk space manually might be required from time to time.
$ sudo rm -r ~/.scramjet_sequences
- To pack and run this Sequence, on terminal #2 of your Linux machine execute the following commands:
# Create a directory __pypackages__ in the same directory as main.py
~/Training-script$ mkdir __pypackages__
# Install dependencies in the __pypackages__ folder.
~/Training-script$ pip3 install -t __pypackages__ -r requirements.txt
# Pack the Training-script folder into a gzip format
~$ si sequence pack Training-script
# Send the Training-script.tar.gz Sequence to the Scramjet's Transform-Hub, with a return <Sequence-id> value
~$ si sequence send Training-script.tar.gz --progress
# Start the Sequence with arguments
~$ si seq start - --args=[\"aws_key\","\aws_secret\","\aws_bucket\"] # Without spacing between args
# Send the audio files as input
~$ si instance input <Instance-id> local/path/to/multi-label-audio.wav -e -t application/octet-stream
# Return list of S3 Bucket objects as output
~$ si instance output <Instance-id>
This directory contains the code and a pre-trained keras model necessary for running an inference with the ability to send an audio file as input
.
- For this Sequence to run properly on your Linux machine use the following command to start STH on terminal #1.
$ DEVELOPMENT=true sth --runtime-adapter=process
NOTE: This Sequence might consume some disk space, clearing out Scramjet's Sequence disk space manually might be required from time to time.
$ sudo rm -r ~/.scramjet_sequences
- To pack and run this Sequence, on terminal #2 of a Linux machine execute the following commands:
# Create a directory __pypackages__ in the same directory as main.py
~/Inference-script$ mkdir __pypackages__
# Install dependencies in the __pypackages__ folder.
~/Inference-script$ pip3 install -t __pypackages__ -r requirements.txt
# Pack the Inference-script folder into a gzip format
~$ si sequence pack Inference-script
# Send the Inference-script.tar.gz Sequence to the Scramjet's Transform-Hub, with a return <Sequence-id> value
~$ si sequence send Inference-script.tar.gz --progress
# Start the Sequence
~$ si sequence start <Sequence-id>
# Send the audio file as input
~$ si instance input <Instance-id> local/path/to/audio.wav -e -t application/octet-stream
# Return classification label of audio .wav file as output
~$ si instance output <Instance-id>
Audio wave file required to be sent as input must be one of ten labels:
'right', 'left', 'no', 'stop', 'down', 'go', 'up', 'yes', 'on', 'off'
Dataset source
https://developer.ibm.com/exchanges/data/all/speech-commands/
Audio conversion to PCM_S16LE Mono 16000Hz
https://convertio.co/opus-wav/
Format : Wave
Duration : <1 second
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 1 channel
Sampling rate : 16.0 kHz
Bit depth : 16 bits
This project is licensed under MIT licenses.