Doing more of the data processing locally, enables storing or transmitting privacy sensitive data more seldom.
Ref
- Scalable Machine Learning with Fully Anonymized Data Using feature hashing on client/sensor-side, before sending to server that performs training. hashing trick is an established way of processing data as part of training a machine learning model. The typical motivation for using the technique is a reduction in memory requirements or the ability to perform stateless feature extraction. While feature hashing is ideally suited to categorical features, it also empirically works well on continuous features
Ideas
- In audio-processing, could we use a speech detection algorithm to avoid storing samples with speech in them? Can then store/transmit the other data in order to do quality assurance and/or further data analysis.