support video platform #27

rom1504 · 2022-12-05T00:59:15Z

https://ytdl-org.github.io/youtube-dl/supportedsites.html

rom1504 · 2022-12-05T01:50:54Z

related : https://github.com/iejMac/video2dataset

rom1504 · 2022-12-06T22:58:32Z

can use yt-dlp _match_valid_url

rom1504 · 2022-12-30T01:32:09Z

from yt_dlp.extractor import gen_extractor_classes, GenericIE

def is_supported(url):
    for ie in gen_extractor_classes():
        if ie != GenericIE and ie.suitable(url):
            return True
    return False

is_supported("https://www.youtube.com/watch?v=i_xBWhJB6VM")
is_supported("https://tv.naver.com/v/31992728/list/67096")
is_supported("https://static1.bigstockphoto.com/thumbs/2/3/2/large2/23261459.jpg")

advised by yt-dlp maintainer
however may miss GenericIE urls like "direct manifest URLs, webpages with youtube embeds etc"

rom1504 · 2022-12-30T02:02:43Z

sadly seems to slow, will need something more approximate

rom1504 · 2022-12-30T02:18:10Z

but actually also it seems high recall but low precision
catching lot of platform links that could contain videos but do not

rom1504 · 2022-12-30T03:03:23Z

better idea: collect a bunch of positive and negative links, and build regexes or a very cheap predictor to know which are good

rom1504 · 2023-04-02T17:38:57Z

Best way to do this

Run cc2dataset without filter or using a very broad filter using yt-dlp filters (eg add video platform #36 ) on a few shards
Run yt-dlp / video2dataset on the result, that gives working and non working links
Use the result as a test set to build a "platform from url" classifier
Url that classifier in cc2dataset to get many links from many platforms

rom1504 · 2023-10-13T17:29:17Z

https://gist.github.com/rom1504/f1f8fd253def49ce02a990229d7bf09d some work on this

rom1504 · 2023-10-18T14:44:27Z

https://github.com/v2fly/domain-list-community/tree/master/data might be interesting

rom1504 · 2023-11-04T21:59:53Z

limited version for 3 platforms (but which works) :

import re
def is_dailymotion_video(url):
  if re.match('^https?://www.dailymotion.com/video/.+$', url):
    return True

  return False

def is_vimeo_video(url):
  if re.match('^https?://vimeo.com/[0-9]+$', url):
    return True
  if re.match('^https?://player.vimeo.com/video/[0-9]+.*$', url):
    return True

  return False

def is_youtube_video(url):
  if re.match('^https?://(www.)?youtube.com/watch\?v=.+$', url):
    return True
  if re.match('^https?://(www.)?youtube.com/v/.+$', url):
    return True
  if re.match('^https?://(www.)?youtube.com/embed/.+$', url):
    return True
  if re.match('^https?://(www.)?youtu.be/.+$', url):
    return True

  return False

rom1504 closed this as completed Dec 5, 2022

rom1504 reopened this Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support video platform #27

support video platform #27

rom1504 commented Dec 5, 2022

rom1504 commented Dec 5, 2022

rom1504 commented Dec 6, 2022

rom1504 commented Dec 30, 2022 •

edited

Loading

rom1504 commented Dec 30, 2022

rom1504 commented Dec 30, 2022

rom1504 commented Dec 30, 2022

rom1504 commented Apr 2, 2023

rom1504 commented Oct 13, 2023

rom1504 commented Oct 18, 2023

rom1504 commented Nov 4, 2023

support video platform #27

support video platform #27

Comments

rom1504 commented Dec 5, 2022

rom1504 commented Dec 5, 2022

rom1504 commented Dec 6, 2022

rom1504 commented Dec 30, 2022 • edited Loading

rom1504 commented Dec 30, 2022

rom1504 commented Dec 30, 2022

rom1504 commented Dec 30, 2022

rom1504 commented Apr 2, 2023

rom1504 commented Oct 13, 2023

rom1504 commented Oct 18, 2023

rom1504 commented Nov 4, 2023

rom1504 commented Dec 30, 2022 •

edited

Loading