Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impd should automatically choose right internal subs #7

Open
asakura42 opened this issue Oct 14, 2023 · 13 comments
Open

impd should automatically choose right internal subs #7

asakura42 opened this issue Oct 14, 2023 · 13 comments

Comments

@asakura42
Copy link
Member

Long story short. I have a video file with a bunch of internal subs:

impd probe output:

Index  Language  Title                                        Type
0      unknown   Www.SeiresHD.Com                             video
1      spa       unknown                                      audio
2      eng       unknown                                      audio
3      spa       Spanish - (Caption/Normal Size Char)         subtitle
4      eng       English - (Closed Caption/Normal Size Char)  subtitle
5      unknown   unknown                                      subtitle

When I add a video to my collection, it condenses video with 5 subtitle, which is that sub track for songs and other sounds. I think that impd should choose internal subs based on:

  1. Target language
  2. Size of subtitles

So the largest and target subs should be chosen for condensing. What do you think?

@tatsumoto-ren
Copy link
Member

Edit your config file and add the following lines:

langs=spa
prefer_internal_subs=yes

@asakura42
Copy link
Member Author

My config:

langs=spanish,spa,esp,lat,cas
prefer_internal_subs=yes
video_dir=/dev/null
bitrate=32k
recent_threshold=10
padding=0.5
line_skip_pattern="^♪〜$|^〜♪$"
filename_skip_pattern="NCOP|NCED"
extract_audio_add_args=()

@asakura42
Copy link
Member Author

Try it yourself with any file from this folder: https://mega.nz/folder/oW8ihKCZ#sHuu63kset-BAn-XqFa7Nw

Condensing doesn't work tho. But it's because of bmp fonts I guess. But that's not critical.

@tatsumoto-ren
Copy link
Member

I guess you need to manually set what tracks you want because the tracks are incorrectly named.

@asakura42
Copy link
Member Author

language: spa

Where they are incorrectly named?

@asakura42
Copy link
Member Author

asakura42 commented Oct 15, 2023

For example, here it chooses Forzados subtitle while should choose 4:

Index  Language  Title     Type
0      unknown   unknown   video
1      spa       unknown   audio
2      eng       unknown   audio
3      spa       Forzados  subtitle
4      spa       unknown   subtitle
5      eng       unknown   subtitle

Can you add smth to detect the largest target-language sub track?

@tatsumoto-ren
Copy link
Member

impd chooses the first track that is:

  • not a song, caption, commentary, etc.
  • matches the preferred language

impd/impd

Line 111 in 48535fb

guess_track_priority() {

@tatsumoto-ren
Copy link
Member

Can you add smth to detect the largest target-language sub track?

Based on the number of symbols used? If so, that is a good idea but I'm not sure if it's easy to do.

@asakura42
Copy link
Member Author

@tatsumoto-ren
You can use smth like:

function subs() {
    mkdir -p /tmp/impd_subs
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" /tmp/impd_subs/"${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
    wc --total=never -l /tmp/impd_subs/*.srt | grep "_spa_" | sort -r | awk -F_ '{print $NF}' | awk -F. '{print $1}' | head -n1
}

This outputs the number of the largest track.

(Main snippet found here: https://gist.github.com/kowalcj0/ae0bdc43018e2718fb75290079b8839a)

@asakura42
Copy link
Member Author

Or much simpler:

while IFS=',' read -r idx lang; do printf "$idx " && ffmpeg -nostdin -hide_banner -loglevel quiet -i "la_directora_S01E02.mkv" -map 0:"$idx" -f srt - | wc -l; done < <(ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "la_directora_S01E02.mkv" | grep ",spa") | sort -nrk2,2 | head -n1 | awk '{print $1}'

@tatsumoto-ren
Copy link
Member

This outputs the number of the largest track.

How fast does it work for a typical episode?

@asakura42
Copy link
Member Author

For 379mb mkv file output of time for this snippet at my old laptop is 0.32s user 0.34s system 108% cpu 0.607 total

@tatsumoto-ren
Copy link
Member

Alright, if it's not too slow (need to test on anime specifically), you can submit the PR. But you also need to think about the following:

  • only apply this method to subtitle tracks; audio tracks can be autoselected using the current method only.
  • filter out (or give lower priority to) commentary tracks since they contain garbage but yet can be longer than normal subtitle tracks
  • filter out all other garbage tracks (songs, signs, comments) though it's likely that they will be shorter than the normal subtitle tracks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants