Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle media files that has an audio stream whose duration is shorter than format=duration #75

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

rodrigomorales1
Copy link

@rodrigomorales1 rodrigomorales1 commented Jul 9, 2024

I noticed that we are currently requesting format=duration from ffprobe and we use it as the duration of the audio. format=duration seems to return the duration of the stream with the longest duration in the media file. Some media files might have a video stream that is longer than the audio stream (see Experiment 1 below), format=duration returns the duration of the video stream for such files, this is not what we want. subed-waveform.el creates the waveforms from an audio stream, so we should specifically use the duration of the audio stream instead of the video stream.

subed-config.el defines the variables subed-video-extensions (link). In all of those extensions, the duration of the audio stream is stored in the field stream=duration. However, for *.mkv and for *.webm files, the duration of the audio stream is stored in the field stream_tags=duration. See Experiment 2 below.

To sum up, the changes in this pull request make sure that we correctly get the duration of the audio stream and store it in subed-waveform-file-duration-ms-cache.

I made sure that the tests were run successfully.

$ cd /path/to/my-fork
$ make test-only && echo Exit code: $?
(...omitted lines...)
  Get duration in milliseconds of a file with 1 video and 1 audio stream
    extension .mkv (213.45ms)
    extension .mp4 (216.40ms)
    extension .webm (575.85ms)
    extension .avi (192.48ms)
    extension .ts (183.97ms)
    extension .ogv (260.71ms)

Ran 717 specs, 0 failed, in 3.61s.
Error in kill-emacs-hook (subed-mpv-kill): (error "Process mock client process does not exist")
Exit code: 0

PS: I also added lexical-binding: t in the tests files, because buttercup was showing some errors. A user opened an issue regarding this, see #74

Experiment 1: File with an audio stream that is shorter than the video stream

The following command creates a .mp4 file that has 1 audio stream and 1 video stream. The audio stream is shorter than the video stream.

$ ffmpeg -v error -y \
     -f lavfi -i 'sine=frequency=1000:duration=3' \
     -f lavfi -i 'testsrc=size=100x100:duration=5' \
     /tmp/a.mp4

We can use ffprobe to display the information on the format and the streams.

As we can see below, the field duration under the field format equals the duration of the video stream.

$ ffprobe -v error -print_format json -show_streams -show_format /tmp/a.mp4
{
    "streams": [
        {
            "index": 0,
            "codec_name": "h264",
            "codec_long_name": "H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",
            "profile": "High 4:4:4 Predictive",
            "codec_type": "video",
            "codec_tag_string": "avc1",
            "codec_tag": "0x31637661",
            "width": 100,
            "height": 100,
            "coded_width": 100,
            "coded_height": 100,
            "closed_captions": 0,
            "has_b_frames": 2,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "1:1",
            "pix_fmt": "yuv444p",
            "level": 10,
            "chroma_location": "left",
            "refs": 1,
            "is_avc": "true",
            "nal_length_size": "4",
            "r_frame_rate": "25/1",
            "avg_frame_rate": "25/1",
            "time_base": "1/12800",
            "start_pts": 0,
            "start_time": "0.000000",
            "duration_ts": 64000,
            "duration": "5.000000",
            "bit_rate": "23996",
            "bits_per_raw_sample": "8",
            "nb_frames": "125",
            "disposition": {
                "default": 1,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
            },
            "tags": {
                "language": "und",
                "handler_name": "VideoHandler",
                "vendor_id": "[0][0][0][0]"
            }
        },
        {
            "index": 1,
            "codec_name": "aac",
            "codec_long_name": "AAC (Advanced Audio Coding)",
            "profile": "LC",
            "codec_type": "audio",
            "codec_tag_string": "mp4a",
            "codec_tag": "0x6134706d",
            "sample_fmt": "fltp",
            "sample_rate": "44100",
            "channels": 1,
            "channel_layout": "mono",
            "bits_per_sample": 0,
            "r_frame_rate": "0/0",
            "avg_frame_rate": "0/0",
            "time_base": "1/44100",
            "start_pts": 0,
            "start_time": "0.000000",
            "duration_ts": 132300,
            "duration": "3.000000",
            "bit_rate": "70114",
            "nb_frames": "131",
            "disposition": {
                "default": 1,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
            },
            "tags": {
                "language": "und",
                "handler_name": "SoundHandler",
                "vendor_id": "[0][0][0][0]"
            }
        }
    ],
    "format": {
        "filename": "/tmp/a.mp4",
        "nb_streams": 2,
        "nb_programs": 0,
        "format_name": "mov,mp4,m4a,3gp,3g2,mj2",
        "format_long_name": "QuickTime / MOV",
        "start_time": "0.000000",
        "duration": "5.000000",
        "size": "44887",
        "bit_rate": "71819",
        "probe_score": 100,
        "tags": {
            "major_brand": "isom",
            "minor_version": "512",
            "compatible_brands": "isomiso2avc1mp41",
            "encoder": "Lavf58.76.100"
        }
    }
}

Experiment 2: ffprobe field where the duration of the audio stream is stored for different media file types

In "Experiment 1", we saw that we shouldn't consider format=duration as the duration of the audio stream. Instead, we should directly use the duration of the audio stream.

The script shown below creates 6 files (1 file for each of the following extensions: *.mkv, *.mp4, *.webm, *.avi, *.ts, *.ogv). Each of the files has 1 video stream with 5 seconds duration and 1 audio stream with 3 seconds duration.

We then use ffprobe to query the duration of the audio stream using the fields stream=duration and stream_tags=duration. In the output, we can see that for *.mkv and *.webm files, the duration is not stored in stream=duration, but instead in stream_tags=duration. In the introduced changes, we create a special case for *.mkv and *.webm files (link to relevant part).

We can also see that *.ts is the only extension that shows two durations for the audio stream. See Experiment 3 below.

video_extensions=(mkv mp4 webm avi ts ogv)
for extension in ${video_extensions[@]}
do
  ffmpeg -v error -y \
         -f lavfi -i 'sine=frequency=1000:duration=3' \
         -f lavfi -i 'testsrc=size=100x100:duration=5' \
         /tmp/a.$extension
done
for extension in ${video_extensions[@]}
do
  echo $extension
  echo "  stream=duration: $(ffprobe -v error -select_streams a -print_format default=nokey=1:noprint_wrappers=1 -show_entries stream=duration /tmp/a.$extension)"
  echo "  stream_tags=duration: $(ffprobe -v error -select_streams a -print_format default=nokey=1:noprint_wrappers=1 -show_entries stream_tags=duration /tmp/a.$extension)"
done
mkv
  stream=duration: N/A
  stream_tags=duration: 00:00:03.003000000
mp4
  stream=duration: 3.000000
  stream_tags=duration: 
webm
  stream=duration: N/A
  stream_tags=duration: 00:00:03.008000000
avi
  stream=duration: 3.030204
  stream_tags=duration: 
ts
  stream=duration: 3.004089
3.004089
  stream_tags=duration: 
ogv
  stream=duration: 3.000000
  stream_tags=duration: 

Experiment 3: Output by ffprobe when providing a *.ts file

In "Experiment 2", we saw that the ffprobe returned two durations for the extension *.ts even though we selected the audio stream through the flag select_streams a.

The command in the code block below creates one *.ts file with a single audio stream and a single video stream. In the output, we can see the field duration is reported once for the audio stream.

In the introduced changes, we query and parse the entire JSON that is returned when using the flags -show_streams and -show_format just as shown in the output below (link to relevant part) Therefore, we don't need to specially handle the duration returned by ffprobe for *.ts.

ffmpeg \
  -v error -y \
  -f lavfi -i 'sine=frequency=1000:duration=3' \
  -f lavfi -i 'testsrc=size=100x100:duration=5' \
  /tmp/a.ts

ffprobe -v error -print_format json -show_streams -show_format /tmp/a.ts
{
    "streams": [
        {
            "index": 0,
            "codec_name": "mpeg2video",
            "codec_long_name": "MPEG-2 video",
            "profile": "Main",
            "codec_type": "video",
            "codec_tag_string": "[2][0][0][0]",
            "codec_tag": "0x0002",
            "width": 100,
            "height": 100,
            "coded_width": 0,
            "coded_height": 0,
            "closed_captions": 0,
            "has_b_frames": 1,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "1:1",
            "pix_fmt": "yuv420p",
            "level": 8,
            "color_range": "tv",
            "chroma_location": "left",
            "field_order": "progressive",
            "refs": 1,
            "id": "0x100",
            "r_frame_rate": "25/1",
            "avg_frame_rate": "25/1",
            "time_base": "1/90000",
            "start_pts": 129600,
            "start_time": "1.440000",
            "duration_ts": 450000,
            "duration": "5.000000",
            "disposition": {
                "default": 0,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
            },
            "side_data_list": [
                {
                    "side_data_type": "CPB properties"
                }
            ]
        },
        {
            "index": 1,
            "codec_name": "mp2",
            "codec_long_name": "MP2 (MPEG audio layer 2)",
            "codec_type": "audio",
            "codec_tag_string": "[3][0][0][0]",
            "codec_tag": "0x0003",
            "sample_fmt": "fltp",
            "sample_rate": "44100",
            "channels": 1,
            "channel_layout": "mono",
            "bits_per_sample": 0,
            "id": "0x101",
            "r_frame_rate": "0/0",
            "avg_frame_rate": "0/0",
            "time_base": "1/90000",
            "start_pts": 128618,
            "start_time": "1.429089",
            "duration_ts": 270368,
            "duration": "3.004089",
            "bit_rate": "384000",
            "disposition": {
                "default": 0,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
            }
        }
    ],
    "format": {
        "filename": "/tmp/a.ts",
        "nb_streams": 2,
        "nb_programs": 1,
        "format_name": "mpegts",
        "format_long_name": "MPEG-TS (MPEG-2 Transport Stream)",
        "start_time": "1.429089",
        "duration": "5.010911",
        "size": "302868",
        "bit_rate": "483533",
        "probe_score": 50
    }
}

@sachac
Copy link
Owner

sachac commented Jul 11, 2024

Nice. Thanks for handling that case and adding tests! The tests run on my system, and waveforms look like they still display properly.

sachac added a commit that referenced this pull request Jul 11, 2024
subed-waveform should now handle the case where
the stop time + subed-waveform-preview-msecs-after
might extend past the end of the file.

* subed/subed-waveform.el (subed-waveform-ffprobe-executable): New.
(subed-waveform-file-duration-ms-cache): New.
(subed-waveform-ffprobe-duration-ms): New,
calculates duration.
(subed-waveform-file-duration-ms): New function
for caching the duration.
(subed-waveform-clear-file-duration-ms-cache): New.
(subed-mpv): Add advice around
subed-mpv-play-from-file for now;
ideally change this to a hook later on.
(subed-waveform--image-parameters): Move to a separate
function for easier testing.
(subed-waveform--make-overlay): Do the
calculations in subed-waveform--image-parameters.
(subed-waveform--update-bars): Use the actual stop
time if needed.
* tests/test-subed-waveform.el: New.
* Set lexical-binding: t in tests/* files

Thanks to rodrigomorales1 and rndusr for bug
reports and pull requests!

Related:
- #68
- #75
- #74
@sachac
Copy link
Owner

sachac commented Jul 11, 2024

Great! I merged this into main, so we can get rid of the branch. Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants