Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

this doesn't seem to work on short files, about 3s or under? #87

Closed
5 tasks done
RossArnott opened this issue Nov 22, 2018 · 35 comments
Closed
5 tasks done

this doesn't seem to work on short files, about 3s or under? #87

RossArnott opened this issue Nov 22, 2018 · 35 comments

Comments

@RossArnott
Copy link

RossArnott commented Nov 22, 2018

If you want to report a bug, or have a specific question, please make sure to include this information:

  • Your operating system
  • Your Python version / distribution
  • Your ffmpeg version
  • The exact command you were trying to run
  • Any output you get when running the command with the --debug flag
@slhck
Copy link
Owner

slhck commented Nov 22, 2018

Please provide some details as mentioned in the issue template.

@slhck slhck closed this as completed Nov 22, 2018
@RossArnott
Copy link
Author

I'm running this command:

ffmpeg-normalize $i -c:a aac -nt ebu -t -5 -f -o processed_audio/$i.m4v

And I find that short files, less than 3 seconds or so, don't get normalized. This may be an artefact of the algorithm needing more samples to work?

ProductName: Mac OS X
ProductVersion: 10.13.6
BuildVersion: 17G65

Python 2.7.10 (default, Oct 6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin

ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers

DEBUG LOG:

Ross-MBP:audio_clips rossarnott$ ffmpeg-normalize W5S1-Rest-3-9-Introduction.m4v -c:a aac -nt ebu -t -5 -f --debug -o processed_audio/W5S1-Rest-3-9-Introduction2.m4v
DEBUG: found executable in path: /usr/local/bin/ffmpeg
DEBUG: found executable in path: /usr/local/bin/ffmpeg
DEBUG: Running command: ['/usr/local/bin/ffmpeg', '-filters']
DEBUG: Parsing streams of W5S1-Rest-3-9-Introduction.m4v
DEBUG: Running command: ['/usr/local/bin/ffmpeg', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', '/dev/null']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
built with Apple LLVM version 10.0.0 (clang-1000.11.45.2)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'W5S1-Rest-3-9-Introduction.m4v':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
creation_time : 2018-11-19T19:25:57.000000Z
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
title : W5S1 3-9 Section 3 Rest Audio
Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Output #0, null, to '/dev/null':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
title : W5S1 3-9 Section 3 Rest Audio
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
encoder : Lavf58.12.100
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=N/A time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

DEBUG: Found audio stream at index 0
INFO: Normalizing file W5S1-Rest-3-9-Introduction.m4v (1 of 1)
DEBUG: Running normalization for W5S1-Rest-3-9-Introduction.m4v
DEBUG: Parsing normalization info for W5S1-Rest-3-9-Introduction.m4v
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running ffmpeg command: ['/usr/local/bin/ffmpeg', '-nostdin', '-y', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-filter_complex', '[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', '/dev/null']
DEBUG: Loudnorm first pass command output:
DEBUG: ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
built with Apple LLVM version 10.0.0 (clang-1000.11.45.2)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'W5S1-Rest-3-9-Introduction.m4v':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
creation_time : 2018-11-19T19:25:57.000000Z
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
title : W5S1 3-9 Section 3 Rest Audio
Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 (aac) -> loudnorm
loudnorm -> Stream #0:0 (pcm_s16le)
Output #0, null, to '/dev/null':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
title : W5S1 3-9 Section 3 Rest Audio
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
encoder : Lavf58.12.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s (default)
Metadata:
encoder : Lavc58.18.100 pcm_s16le
size=N/A time=00:00:02.00 bitrate=N/A speed=38.7x
video:0kB audio:1504kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x7fab4fc095c0]
{
"input_i" : "-22.02",
"input_tp" : "-9.06",
"input_lra" : "0.00",
"input_thresh" : "-32.82",
"output_i" : "-21.55",
"output_tp" : "-8.62",
"output_lra" : "0.00",
"output_thresh" : "-32.36",
"normalization_type" : "linear",
"target_offset" : "16.55"
}
DEBUG: Loudnorm stats parsed: {"input_i": "-22.02", "input_tp": "-9.06", "input_lra": "0.00", "input_thresh": "-32.82", "output_i": "-21.55", "output_tp": "-8.62", "output_lra": "0.00", "output_thresh": "-32.36", "normalization_type": "linear", "target_offset": "16.55"}
INFO: Running second pass for W5S1-Rest-3-9-Introduction.m4v
DEBUG: Running ffmpeg command: ['/usr/local/bin/ffmpeg', '-y', '-nostdin', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-filter_complex', '[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:measured_i=-22.02:measured_lra=0.0:measured_tp=-9.06:measured_thresh=-32.82:linear=true:print_format=json[norm0]', '-map_metadata', '0', '-map_chapters', '0', '-c:v', 'copy', '-map', '[norm0]', '-c:a', 'aac', '-c:s', 'copy', '/var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v']
DEBUG: Moving temporary file from /var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v to processed_audio/W5S1-Rest-3-9-Introduction2.m4v
DEBUG: Normalization finished
INFO: Normalized file written to processed_audio/W5S1-Rest-3-9-Introduction2.m4v

@slhck slhck reopened this Nov 22, 2018
@slhck
Copy link
Owner

slhck commented Nov 22, 2018

In the log it says that an output file was written. Is this file the same as the input file, or silent, or...?

It could be that the EBU-type normalization requires more input, but I'd have to check.

@RossArnott
Copy link
Author

Thanks for the quick response! The output file is the same amplitude as the input file for the example given. The longer files get normalised, which in this case is mostly making them significantly louder. The short files don't get changed. I'm batch processing dozens of files and I end up with a few (the short ones, it seems) at significantly different volume levels.

@slhck
Copy link
Owner

slhck commented Nov 22, 2018

OK, thanks for clarifying this. I'll see if there's a way to tune the parameters to make it work for small files. If not I'll have to at least print a warning.

@michaelcrossland
Copy link

michaelcrossland commented Nov 22, 2018 via email

@slhck
Copy link
Owner

slhck commented Nov 22, 2018

Can you please point to a reference for your claim that ffmpeg requires at least 30 seconds of audio material to be able to normalize a file?

@RossArnott
Copy link
Author

Fair enough. Empirically it looks like ffmpeg-normalize does actually work on files of about 4s or longer, but that's not exactly a scientific test and it could be luck or depend on the audio content.
I'll need to figure out a way to try and automatically flag these files that are not being adjusted correctly.

@slhck
Copy link
Owner

slhck commented Nov 22, 2018

You can use the option to print the statistics and inspect the loudness before and after. But that's not a proper solution either. I'll see what I can do.

@kylophone
Copy link

I've known about this for a while but haven't had time to fix it. I should really just fix the ffmpeg filter. Can you leave this open and assign to me?

@slhck
Copy link
Owner

slhck commented Nov 26, 2018

I guess that would be way more efficient than me digging through your code. Thanks!

@slhck
Copy link
Owner

slhck commented Nov 26, 2018

Seems I can't assign you, unless you're a collaborator or the OP. I'll leave it open and assign to me in the meantime.

@slhck slhck self-assigned this Nov 26, 2018
@slhck slhck added the bug label Nov 26, 2018
@NiloCK
Copy link

NiloCK commented Apr 24, 2019

@kylophone Hoping for some input before putting shovels in the ground:

Is file-length the only issue here? Do you expect a scripted solution that pads a file with 5 seconds of blank audio, runs loudnorm, and then strips the blank audio to work?

@kylophone
Copy link

kylophone commented Apr 24, 2019

The problem has to do with the definition of Integrated Loudness in BS 1770 / EBU R128. IL by definition needs at least 3 seconds. I haven't had a chance to look but padding with silence should work, I think.

@NiloCK
Copy link

NiloCK commented Apr 25, 2019

That's as much of a go-ahead as I need. It'll be at least a couple of weeks before I try this, but I'll report back. Thanks.

@NiloCK
Copy link

NiloCK commented May 3, 2019

For anyone interested, the steps outlined in my issue:

ffmpeg -i input -af "adelay=10000|10000" enlarged

Pads the audio with ten seconds of silence at the beginning. Necessary because of this bug

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json -f null -

Gets loudness data from the file.

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-XXX:measured_LRA=XXX:measured_TP=-XXX:measured_thresh=-XXX:offset=0.58:linear=true:print_format=summary -ar 48k paddedNormalized

Feeds the loudness data back into the normalization alg for better results

ffmpeg -i paddedNormalized -ss 00:00:10.000 -acodec copy normalized

Removes the 10 seconds of silence

Work just fine. This could be added to this library as a work-around for the upstream bug.

@slhck
Copy link
Owner

slhck commented May 3, 2019

Thanks for sharing this. I have to admit, I'm not in favor of adding functionality to automatically pad and truncate the audio streams. That always bears potential for issues with audio-video sync. I'd rather just provide a warning when the audio stream is < 3 s and link to a FAQ entry.

@csestili
Copy link

The warning pointing to this issue appears even when using RMS normalization (-nt rms). If I understand the explanation above correctly, there is no minimum length requirement for RMS normalization; is that correct? If so, could this warning please be removed for RMS normalization? Thank you!

@slhck
Copy link
Owner

slhck commented Mar 14, 2020

@csestili True, this should only affect EBU-type normalization. I fixed the message in 69ac934, v1.15.7 available now.

@5tan
Copy link

5tan commented Aug 17, 2020

Method presented by @NiloCK works nice, except it can clip file a little bit, e.g.:
image

I am not ffmpeg expert, but after some experiments I have concluded that following step

# Removes the 10 seconds of silence
ffmpeg -i paddedNormalized -ss 00:00:10.000 -acodec copy normalized

works only with 2048 samples accuracy.

Thus for my 16kHz sound files I use 16 seconds padding (16s=256kS being LCD of 16k and 2048) to avoid clipping.

@dotancohen
Copy link

Thanks for sharing this. I have to admit, I'm not in favor of adding functionality to automatically pad and truncate the audio streams. That always bears potential for issues with audio-video sync. I'd rather just provide a warning when the audio stream is < 3 s and link to a FAQ entry.

That would make this otherwise wonderful tool useless for shorter clips, which as evidenced by the existence of this bug, people need. A use case for shorter audio clips is when normalized single spoken words when learning a language, as seen here. In this use case, audio normalization is important but the ability to sync is not important.

Therefore I suggest implementing the warning that audio may not sync properly after normalization, but enabling the pad-then-truncate to happen.

@slhck
Copy link
Owner

slhck commented Oct 6, 2020

@dotancohen Thanks for your feedback. I'm not against such a feature per se, it's just that it is a bit of additional work and may lead to files out of sync, so it needs to be well-tested. I'll look into how to implement it, but I can't give you an ETA on it, unfortunately.

auricgoldfinger pushed a commit to auricgoldfinger/audio-normalize that referenced this issue Dec 15, 2020
@GabArl
Copy link

GabArl commented May 11, 2021

@5tan
no issues with clipping if in your last line you change -acodec copy to -acodec %codec_name%
After getting %codec_name% from
FOR /F "tokens=*" %%C IN ('"ffprobe -i "%input_file%" -select_streams a:0 -show_entries stream=codec_name -hide_banner -v quiet -of csv=p=0"') DO ( SET codec_name=%%C)

This way the codec will be preserved instead of copied (the exact difference I was not able to understand so far 👍 )


Yes, (when using the approach with -acodec copy ) for the padding time (t) in terms of accuracy a size of 2048 samples does work for 16bit and any sample rate (Fs) using the formula t = LCM( Fs , size ) / Fs (not LCD!), but it did not work for me anymore once I dealt with 24bit files. And keep in mind that for example 44100Hz results in a pad time of 512 seconds...

Honestly, I did not fully understand what was going on, but I have a table if someone wants so experiment with it more 😄
I was not able to figure out the math for 24bit, all my values became ridicously high and did not even work in any way.

image

@NiloCK
Copy link

NiloCK commented Aug 17, 2021

@dotancohen Thanks for your feedback. I'm not against such a feature per se, it's just that it is a bit of additional work and may lead to files out of sync, so it needs to be well-tested.

Another potentially useful distinction is that there are no sync issues on pure audio files (ie, non-videos). From this thread, it looks like most people running into this bug are normalizing single spoken words, which is much more likely to be audio than video.

Clipping issues notwithstanding (thanks to everyone who pointed this out), I think a "better" fix for THIS utility might be to keep throwing that error for <3s video files, but do a pad-and-truncate hack on audio files and spit out a warning. Would you consider a PR that adds this behavior?

Heck, vine doesn't even exist anymore!

(although, honestly, some ex-vine content processing people are exactly the ones who have a fully-baked solution to this problem!)

@dotancohen
Copy link

I think a "better" fix for THIS utility might be to keep throwing that error for <3s video files, but do a pad-and-truncate hack on audio files and spit out a warning.

I agree that this is the best solution, given the use cases stated above.

@slhck slhck reopened this Aug 17, 2021
@slhck
Copy link
Owner

slhck commented Aug 17, 2021

Yes, that seems like a useful solution. It should apply to audio-only files then, which would make the processing easier.

@richardpl
Copy link

Can this be reproduced somehow reliably? I could not found any input sample to test.
Also I pushed some patches to FFmpeg master to loudnorm filter, I guess one of them should fix this bug.

@slhck
Copy link
Owner

slhck commented Nov 9, 2022

Thanks! I guess in particular this one: FFmpeg/FFmpeg@36572a0

I will leave this open until I get time to test that. I will leave the warning in until this fix lands in a specific ffmpeg version.

@richardpl
Copy link

Maybe, maybe not, there is also fix for report of 0.0 for LRA for short audio but will look about posting it too.

@homocomputeris
Copy link

Can the current warning for <3s files be ignored somehow? Or maybe it's been solved?

@slhck
Copy link
Owner

slhck commented Apr 24, 2023

This fix should be in FFmpeg v6.0 or higher. I will close this issue for now.

@slhck slhck closed this as completed Apr 24, 2023
@dailylama
Copy link

if it's fixed, then why does it show warning redirecting here

	ffmpeg -version
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers

	ffmpeg-normalize 1.wav
WARNING: Audio stream has a duration of less than 3 seconds. Normalization may not work. See https://github.com/slhck/ffmpeg-normalize/issues/87 for more info.
`

slhck added a commit that referenced this issue Jul 22, 2023
@slhck
Copy link
Owner

slhck commented Jul 22, 2023

Because I forgot to remove it. It no longer shows a warning now.

@lorenblue
Copy link

Hey there, sorry but I just want to clarify... I am trying to get short (< 3s) spoken word audio to normalize to around -14 LUFS, is this supported or not? Cheers thanks.

@slhck
Copy link
Owner

slhck commented Feb 26, 2024

This should work better now. Just make sure to use a recent ffmpeg version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests