Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support video platform #27

Open
rom1504 opened this issue Dec 5, 2022 · 10 comments
Open

support video platform #27

rom1504 opened this issue Dec 5, 2022 · 10 comments

Comments

@rom1504
Copy link
Owner

rom1504 commented Dec 5, 2022

https://ytdl-org.github.io/youtube-dl/supportedsites.html

@rom1504 rom1504 closed this as completed Dec 5, 2022
@rom1504 rom1504 reopened this Dec 5, 2022
@rom1504
Copy link
Owner Author

rom1504 commented Dec 5, 2022

@rom1504
Copy link
Owner Author

rom1504 commented Dec 6, 2022

can use yt-dlp _match_valid_url

@rom1504
Copy link
Owner Author

rom1504 commented Dec 30, 2022

from yt_dlp.extractor import gen_extractor_classes, GenericIE

def is_supported(url):
    for ie in gen_extractor_classes():
        if ie != GenericIE and ie.suitable(url):
            return True
    return False

is_supported("https://www.youtube.com/watch?v=i_xBWhJB6VM")
is_supported("https://tv.naver.com/v/31992728/list/67096")
is_supported("https://static1.bigstockphoto.com/thumbs/2/3/2/large2/23261459.jpg")

advised by yt-dlp maintainer
however may miss GenericIE urls like "direct manifest URLs, webpages with youtube embeds etc"

@rom1504
Copy link
Owner Author

rom1504 commented Dec 30, 2022

sadly seems to slow, will need something more approximate

@rom1504
Copy link
Owner Author

rom1504 commented Dec 30, 2022

but actually also it seems high recall but low precision
catching lot of platform links that could contain videos but do not

@rom1504
Copy link
Owner Author

rom1504 commented Dec 30, 2022

better idea: collect a bunch of positive and negative links, and build regexes or a very cheap predictor to know which are good

@rom1504
Copy link
Owner Author

rom1504 commented Apr 2, 2023

Best way to do this

  1. Run cc2dataset without filter or using a very broad filter using yt-dlp filters (eg add video platform #36 ) on a few shards
  2. Run yt-dlp / video2dataset on the result, that gives working and non working links
  3. Use the result as a test set to build a "platform from url" classifier
  4. Url that classifier in cc2dataset to get many links from many platforms

@rom1504
Copy link
Owner Author

rom1504 commented Oct 13, 2023

@rom1504
Copy link
Owner Author

rom1504 commented Oct 18, 2023

@rom1504
Copy link
Owner Author

rom1504 commented Nov 4, 2023

limited version for 3 platforms (but which works) :

import re
def is_dailymotion_video(url):
  if re.match('^https?://www.dailymotion.com/video/.+$', url):
    return True

  return False

def is_vimeo_video(url):
  if re.match('^https?://vimeo.com/[0-9]+$', url):
    return True
  if re.match('^https?://player.vimeo.com/video/[0-9]+.*$', url):
    return True

  return False

def is_youtube_video(url):
  if re.match('^https?://(www.)?youtube.com/watch\?v=.+$', url):
    return True
  if re.match('^https?://(www.)?youtube.com/v/.+$', url):
    return True
  if re.match('^https?://(www.)?youtube.com/embed/.+$', url):
    return True
  if re.match('^https?://(www.)?youtu.be/.+$', url):
    return True

  return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant