The workflow (data cleaning and machine learning) can be found in the Jupyter notebook in the top directory. I used Google Colaboratory in order to utilize their free GPUs to speed up the fitting of the CNN, but all data I used were copied from the Walabot-Data directory to my Google Drive.
- We collected ~30 min of recording data of 2D image slices of a full .45 caliber ammo clip placed approximately 1 meter away from the Walabot recording device. This resulted in files of 10-20 MB with about 400 observations recorded. We repeated this experiment for an empty clip and also a control object. The data were recorded for all three on 6/19/2019 and 6/20/2019.
- I reconstructed the 3D data cubes from the 2D data using the information regarding the slice location. The dimensions of these cubes were 101x91x12, corresponding to an R=np.arange(125,175+0.5,0.5), Phi=np.arange(-90,90+2,2), and Theta=np.arange(-45,45+5,5). Each slice is a random collection of the different slices for each theta. Assuming the experiment is static, this helps to represent the noise of the Walabot device in recording the same scene.
- To prepare for ML, I artificially inflated the dataset by performing a small translation of each image slice to generalize the data to other scenarios. I note that I was not able to get a model to work by translating in the depth dimension, perhaps because the resolution was too poor. I cropped the final data cubes in all three dimensions to ignore detector edge effects.
- I built a 3D convolutional neural network comprised of two convolutional layers, a maxpool layer, a flattening layer, a dense layer, and an output layer assigning a classification of the images into the three categories: full clip, empty clip, control. The data were split 80-20% into training-validation groups. The final model performed at 83% accuracy and an f1 score of 0.87. I note that the model performs well at identifying the difference between a full vs empty clip, but had a difficult time identifying the control object apart from the clips (i.e., high false positive rate).
- The Walabot data collection should be set to record the 3D data rather than a sequence of 2D image slices in order to reduce data collection time. This would also be useful for the operational stage as well as collecting data for the machine learning stage. Currently, these image slices are coming from the most reflective slice, which makes it difficult generalize the data output for the lesser reflective parts of the data cube. I also noticed that all the image slices were scaled to 0-255 for image display. If possible, it would be best to consider recording the raw amplitude rather than the normalized output. This may account for why we have such a high false positive rate; the glass control object should have a weaker signal, but was amplified due to how the images are recorded.
- The data collection should include a variety of object orientations and other types of reflective objects in order to reduce the false positive rate. While the data can be augmented with translations to help generalize, the reflective signal may differ when the ammo clips are oriented face on rather than edge on and potentially have a signal drop off with distance from the detector. These are all hypotheses that can be easily tested.
- The Walabot detector settings should be optimized for a wide enough field of view without sacrificing resolution. We experimented with a really high resolution image; however, the field of view was too small to provide context as to which part of the clip was being imaged. This should be optimized for the highest accuracy of clip detection while still within the limits of the on-board computing capacity.
- More of a note to self: If I had more time, I would have built the CNN with a batch generator rather than loading the data cubes all at once into memory. Keras has a way to do this, but due to the uniqueness of how the data were collected, a lot of customization would be necessary. This will absolutely be necessary when more data are collected. It might also be worth it to look into local GPU hardware for running the model fitting as I imagine that the final model fitting with all the data at the best resolution and potentially more layers may no longer be compatible with the free Google Colaboratory's limits on memory and wall time.