Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use 'v' flag for regex subtitle parsing #556

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

artjomsR
Copy link
Contributor

We've had a brief discussion about this on discord, so I just wanted to create this so it won't be totally forgotten.

My current regex looks like this ^[#\-\(\.\)\s\p{Lu}]+$|-?\[.+\]|[♪♬#~〜]+|(.*)\n+(?!(?:[\p{L}\s]+:\s*)?-)(.*), which is the combination of several existing regexes from the Readme file. I've been running this flag for 3+ months now and haven't noticed any issues

@@ -197,7 +197,7 @@ Useful examples of regular expressions:
- `(.*)\n+(?!-)(.*)` : Some subtitles are split in several lines and this regex forces them into a single line. For this filter to work, you must also put `$1 $2` in the "Subtitle regex filter text replacement" field.
- **NB**: When using this regex pattern in combination with other patterns (using the `|` operator, see below), place this pattern at the end. This ensures that all other regex transformations are applied first, and then the results are finally combined into a single line.
- `-?\[.*\]` : Remove indications enclosed by square brackets that sound or music that is playing (e.g. "**\[PLAYFUL MUSIC]**" or "**\-[GASPS]**")
- `^[-\(\)\.\sA-ZAÂÃÀÇÉÊÍÓÔÕÚÑ]+$` : As an alternative to the above, filter out descriptions written in capital letters, but without the square brackets (e.g. "**PLAYFUL MUSIC**"). If your language has additional letters with diacritics, you feel free to add them to this list.
- `^[\-\(\)\.\s\p{Lu}]+$` : As an alternative to the above, filter out descriptions written in capital letters, but without the square brackets (e.g. "**PLAYFUL MUSIC**"). If your language has additional letters with diacritics, you feel free to add them to this list.
Copy link
Contributor Author

@artjomsR artjomsR Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missed slash here is an existing "bug": it needs to be \- and not -, as the dash does not represent a range but should be an escaped character instead. (to capture lines such as - GASPS)

@killergerbah
Copy link
Owner

Thanks! I didn't realize the v flag was so powerful.

@killergerbah killergerbah merged commit 2286c90 into killergerbah:main Oct 28, 2024
1 check passed
@killergerbah killergerbah added this to the Extension v1.6.0 milestone Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants