Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xhamster] Initials RegEx changed #26526

Merged
merged 8 commits into from
Sep 6, 2020

Conversation

TheRealDude2
Copy link
Contributor

@TheRealDude2 TheRealDude2 commented Sep 4, 2020

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

I noticed a problem again with the RegEx for the site initials, with patch #26254. As it looks like the web server has problems to deliver the line completely. On the other hand, the character ; appears more often in the line now.

I have tested a few lines:

  • original version
    r'window.initials\s*=\s*({.+?})\s*;', webpage, 'initials',
  • minimal change
    r'window.initials\s*=\s*({.+?})\s*;<', webpage, 'initials', #minimal changed
  • full <script> tag parse
    r'<script\sid='initials-script'>window.initials\s=\s*({.+?})\s*;</script>', webpage, 'initials',
  • parse until script end tag
    r'window.initials\s*=\s*({.+?})\s*;\s*</script>', webpage, 'initials',

I prefer the last one. i think this is the most robust version for changes to the line on the website.

Closes #26353

@TheRealDude2 TheRealDude2 changed the title Xhamster initials script [xhamster] - initials parsing problem Sep 4, 2020
@TheRealDude2 TheRealDude2 changed the title [xhamster] - initials parsing problem [xhamster] Initials RegEx changed Sep 4, 2020
@TheRealDude2 TheRealDude2 mentioned this pull request Sep 5, 2020
5 tasks
@dstftw dstftw merged commit 62ae19f into ytdl-org:master Sep 6, 2020
@TheRealDude2 TheRealDude2 deleted the xhamster_initials_script branch September 6, 2020 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[xhamster] No video format found again with patch #26254 and possible workaround
2 participants