Storing raw data in non-audiable format #976
Replies: 2 comments 3 replies
-
Hi @ahmaeldesoky that is an interesting question. I'm not familiar with the privacy requirements related to "non-audible formats", but if you store the data as spectrograms it wouldn't be very hard for someone to re-create audio files from them. I wonder if encrypting the audio files on your computer would satisfy the requirements? Then, they will still be audio files but can be password-protected. |
Beta Was this translation helpful? Give feedback.
-
Hi, Although somewhat off-topic with regard to the specific enquiry, I would like to take the title of the enquiry as an opportunity for the following idea: Since I record environment sound for bird predicting 24/7 since one and a half year at one location and I prepare to add additional environment records via AudioMoth, I'm thinking also for a good solution to collect the sound data in a better way than a folder with tens of thousands of wave files. And we should thinking for use cases where the meta information are more than creator, GPS and time data. Maybe users want to include time stamped weather data for better predictions or for meta-using of the sound source predictions: Recognizing animal species is not an end in itself. Possible database solutionOf course you can collect such meta information in additional of tens of thousands of text- or csv-files. But scientists in other disciplines who also have to work with large amounts of data, combined with metadata, have already created a good solution for this: HDF5 (BSD-like license for general use), https://www.hdfgroup.org/solutions/hdf5/ HDF5 offers some really good advantages for such data as environmental sound data: Data inside such HDF5 database files:
The result is not a zip file with a flat or deep folder structure of wave files (for example) and an explaning document how to use the names of the files, the interpretation of text or csv files as meta information ... No, the result is one HDF5 file. Processing of such data:
This is the reason that I decide to collect my audio data in such HDF5 database files. I have created functions to insert audio files into HDF5 files including metadata, as well as functions to restore the original files with name and content or just to compare with files in folders if necessary. The metadata contains all the original information. Any metadata can be added. I am in the process of realizing the corresponding accesses to this data so that it can be treated like very large audio files of periods of weeks or months. Question to the developer of opensoundscapeIs there general interest from opensoundscpae developers to possibly consider such support for large sound data sets via HDF5? Maybe I can participate with a development of my library that is customised for opensoundscape. Since I want to use opensoundscape to analyse my data, I will write some functions to bring HDF5 and opensounscape together anyway. Please let me know in case there is some interest. Best regards, |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I'm new to the package and bioacoustics in general. My question is simply that I need to save the raw audio data that I collect from the field in a non-audible format (for privacy and GDPR requirements); however, I still need a format that can be used later as input for different ML/CNN models (pre-trained that are available in bioacoustics-model-zoo but also custom ones). What would be the best way to handle this? I thought of the spectrogram objects as the way (opensoundscape.spectrogram.Spectrogram object), but I'm not sure how will this be handy. Of course, there will be always a limitation that there won't be any ground truth data for validating the model outputs by the human ear, but I'm trying to optimize as much as possible.
Thanks in advance
Ahmed
Beta Was this translation helpful? Give feedback.
All reactions