Bug: decode function leaks memory #19

antimora · 2020-07-23T05:51:31Z

I am using miniaudio to decode MP3 bytes. I have a ML training task that decodes thousands of MP3 through many thousand training iterations. I have noticed the memory usage blew up and the OS started swapping.

I did a simple test and confirmed indeed miniaudio has memory leak.

from miniaudio import SampleFormat, decode

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

for i in range(10000):
    decoded_audio = decode(audio_bytes,
                           nchannels=1, sample_rate=16000,
                           output_format=SampleFormat.SIGNED32)

Set file: common_voice_en_20603299.zip

10000 iterations uses up 3 GB memory:


[tmp]$ /bin/time -v /usr/bin/python3 /workspaces/ml/tmp/test9.py
        Command being timed: "/usr/bin/python3 /workspaces/ml/tmp/test9.py"
        User time (seconds): 56.05
        System time (seconds): 1.52
        Percent of CPU this job got: 101%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.81
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3274532
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 798090
        Voluntary context switches: 30
        Involuntary context switches: 723
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I did further testing/debugging and concluded calling lib.ma_decode_memory iteratively causes memory leak.

According to :
https://github.com/dr-soft/miniaudio/blob/b80f7f949152f93a0af499b4d6d07b8e60d0e673/extras/miniaudio_split/miniaudio.h#L4120

ma_free probably needs to be called to free allocated memory like in this example: mackron/miniaudio#97 (comment)

Maybe also it is a good idea to release frames and memory allocated by ffi.new call. (see https://cffi.readthedocs.io/en/latest/ref.html#ffi-release-and-the-context-manager)

Here is code to prove repeat calls to ma_decode_memory yields memory leak.

from pathlib import Path

from miniaudio import (DecodeError, DitherMode, SampleFormat,
                       _array_proto_from_format, _width_from_format,
                       ffi, lib)

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

data = audio_bytes
nchannels = 1
sample_rate = 16000
dither = DitherMode.NONE
output_format = SampleFormat.SIGNED16
sample_width = _width_from_format(output_format)
samples = _array_proto_from_format(output_format)
frames = ffi.new("ma_uint64 *")
memory = ffi.new("void **")

decoder_config = lib.ma_decoder_config_init(output_format.value, nchannels, sample_rate)
decoder_config.ditherMode = dither.value

for i in range(30000):

    result = lib.ma_decode_memory(data, len(data), ffi.addressof(decoder_config), frames, memory)
    if result != lib.MA_SUCCESS:
        raise DecodeError("failed to decode data", result)

The text was updated successfully, but these errors were encountered:

antimora · 2020-07-23T19:00:18Z

I found a work around. I am sharing for those who are in the same boat.

It appears mp3_read_f32 function of pyminiaudio correctly frees allocated memory and therefore does not suffer from the memory leak. But currently mp3_read_f32 is broken due to #18 that I reported. So here is a re-implementation for those who wish to use pyminiaudio's in memory MP3 decoding. It includes a mini test that shows there is no memory leak:

import array
from pathlib import Path

from miniaudio import DecodeError, ffi, lib

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

def mp3_read_f32(data: bytes) -> array:
    '''Reads and decodes the whole mp3 audio data. Resulting sample format is 32 bits float.'''
    config = ffi.new('drmp3_config *')
    num_frames = ffi.new('drmp3_uint64 *')
    memory = lib.drmp3_open_memory_and_read_pcm_frames_f32(data, len(data), config, num_frames, ffi.NULL)
    if not memory:
        raise DecodeError('cannot load/decode data')
    try:
        samples = array.array('f')
        buffer = ffi.buffer(memory, num_frames[0] * config.channels * 4)
        samples.frombytes(buffer)
        return samples, config.sampleRate, config.channels
    finally:
        lib.drmp3_free(memory, ffi.NULL)
        ffi.release(num_frames) # Release num_frames memory as a precaution. 


for i in range(10000):
    decoded_audio, sample_rate, channels = mp3_read_f32(audio_bytes)

Here are OS stats for executing this.

[tmp]$ /bin/time -v /usr/bin/python3 /workspaces/ml/tmp/test10.py

        Command being timed: "/usr/bin/python3 /workspaces/ml/tmp/test10.py"
        User time (seconds): 27.25
        System time (seconds): 0.64
        Percent of CPU this job got: 102%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.16
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 196120
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 30339
        Voluntary context switches: 39
        Involuntary context switches: 2387
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The maximum memory usage is 196.12 MB.

irmen · 2020-10-30T11:54:36Z

Could you perhaps retry with the latest version 1.36? There have been various updates to the miniaudio library the past few months
Or is it really a problem in the python wrapper module

antimora · 2020-10-31T06:53:14Z

I am using a workaround and I haven't tried with updated libraries. Were you able to recreate the bug? I have outlined all the steps to recreate it.

irmen · 2020-10-31T12:27:51Z

not sure why you're tagging Cameron here :)

irmen · 2020-10-31T13:31:22Z

commit 77bd203 introduces the pre-emptive releasing of those cffi resources. As #18 is now fixed, your workaround for that particular function should no longer be necessary. I've checked with 10.000 iterations and it doesn't grow memory.

I'll investigate the memory use of the decode function later.

irmen · 2020-11-01T03:40:40Z

I believe this is now fixed in the new release. As are the API bugs in the mp3_load functions.

antimora mentioned this issue Jul 23, 2020

🚀 Feature Request: Loading audio data from BytesIO or memory pytorch/audio#800

Closed

irmen closed this as completed in 9e90dab Nov 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: decode function leaks memory #19

Bug: decode function leaks memory #19

antimora commented Jul 23, 2020 •

edited

Loading

antimora commented Jul 23, 2020 •

edited

Loading

irmen commented Oct 30, 2020 •

edited

Loading

antimora commented Oct 31, 2020 •

edited

Loading

irmen commented Oct 31, 2020

irmen commented Oct 31, 2020

irmen commented Nov 1, 2020

Bug: decode function leaks memory #19

Bug: decode function leaks memory #19

Comments

antimora commented Jul 23, 2020 • edited Loading

antimora commented Jul 23, 2020 • edited Loading

irmen commented Oct 30, 2020 • edited Loading

antimora commented Oct 31, 2020 • edited Loading

irmen commented Oct 31, 2020

irmen commented Oct 31, 2020

irmen commented Nov 1, 2020

antimora commented Jul 23, 2020 •

edited

Loading

antimora commented Jul 23, 2020 •

edited

Loading

irmen commented Oct 30, 2020 •

edited

Loading

antimora commented Oct 31, 2020 •

edited

Loading