[TikTok] Support Sigi-type pages, etc #30479

dirkf · 2022-01-07T13:00:58Z

Please follow the guide below

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
Except: this PR subsumes PR fix tiktok when logged in #30224 whose author also affirmed this.
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

TT switched (possibly partially) its framework from NextJS to Sigi, and the persisted state JSON sent in the page changed as a result. Instead of a <script> element with id __NEXT_DATA__, we get one with id sigi_persisted_state and JSON with a slightly different structure.

This PR deals with both types of page format, based on PR #30224 and this patch which gets more metadata.

Also, extraction could fail with a timeout (Error 60 in Windows, SSLError('The read operation timed out',) in Linux) or connection reset (Error 54 in Windows) due to some weird blocking by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. The extractor fetched https://www.tiktok.com/ before doing anything else. In yt-dlp, the code fetches the webpage itself twice, commenting that you get 403 otherwise. This PR copies that tactic but instead of fetching the whole page (GET request) it just sends a HEAD request; if a page is actually returned, rather than an error with a Set-Cookie header, it doesn't actually have to be downloaded.

Probably resolves #28741
Resolves #30251
Resolves #30432
Resolves #30439
Resolves #30445
Resolves #30454
Resolves #30470.

Finally the non-working TikTokUserIE has been resurrected for accessing all the videos of a specific user.

Resolves #30174.

tiktok now shows metadata in a diff format when logged in

See ytdl-org#30251 (comment)

dirkf · 2022-01-07T15:26:37Z

Patching hints, depending on your installation type (substitute PR number 30479 and file youtube_dl/extractor/tiktok.py as appropriate):

hessijames79 · 2022-01-18T19:24:30Z

Hi!
After your patch has worked for several days, I am now encountering new problems (with the "vanilla" youtube-dl as well): #30538

Patrick

Add TikTokVM Partial fix for TikTokUser

afterdelight · 2022-05-02T15:04:25Z

when this merge?

afterdelight · 2022-05-05T23:27:52Z

youtube_dl/extractor/tiktok.py

+        state = self._parse_json(
+            get_element_by_id('SIGI_STATE', html)
+            or self._search_regex(
+                r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]*>[^=]*=\s*(?P<json>{.+?})\s*(?:;[^<]+)?</script''',


can @dirkf review this?

pukkandan · 2022-06-14T19:01:02Z

youtube_dl/extractor/tiktok.py

+
+        page_props = self._get_SIGI_STATE(user_id, webpage)
+        user_data = try_get(page_props, lambda x: x['UserModule']['users'], dict)
+        if user_data:


It should be

if not user_data: raise ExtractorError(...) ...

If the extractor returns None, youtube-dl will just silently exit. See yt-dlp/yt-dlp#3776 (comment)

Originally there was some fallback code that would run if not user_data. Don't we get an ExtractorError anyway if an IE returns a None info_dict? (No, apparently not!)

pukkandan · 2022-06-14T19:01:14Z

youtube_dl/extractor/tiktok.py

+            if result:
+                result['display_id'] = user_id
+                return result


dirkf · 2022-06-14T20:32:04Z

As observed in yt-dlp/yt-dlp#3776 (comment) the user pages are currently redirecting to a captcha more or less whatever we do wrt cookies and UAs.

In a browser with JS disabled and UA set to Mozilla/5.0 after clearing cookies for TT, a request to a user page gets the captcha page, and then reloading with the provided cookies opens the desired page. This doesn't happen with the extractor even with a delay between the two fetches.

Based on #3624, ytdl-org/youtube-dl#30479 Closes #3551 Authored by dirkf, sulyi, pukkandan

bvoq · 2022-12-26T00:20:42Z

Looks like every issue is about this, when will this get merged?

OwenMelbz · 2023-08-04T09:29:57Z

Do we think this will see the light of day? :D Was hoping to be able to use it for a little fun project!

Thanks

kashif-umair · 2023-08-04T19:57:36Z

I think this is also outdated now. There is no sigi_persisted_state in the returned HTML.

wranai and others added 4 commits January 7, 2022 03:58

fix tiktok when logged in

b4eb012

tiktok now shows metadata in a diff format when logged in

Add further improvements and metadata extraction for SIGI-type pages

37f0157

See ytdl-org#30251 (comment)

Fix timeout on attempt to set up session cookies

09c0980

Resurrect TikTokUserIE

b428754

dirkf mentioned this pull request Jan 7, 2022

Tiktok not get video url sometimes #30251

Open

5 tasks

dirkf mentioned this pull request Jan 7, 2022

TikTok Error #30445

Open

dirkf mentioned this pull request Jan 31, 2022

any solution for tiktok #30580

Closed

dirkf marked this pull request as draft February 22, 2022 20:34

dirkf mentioned this pull request Apr 24, 2022

TikTok (tiktok.com) broken #30893

Open

dirkf linked an issue Apr 24, 2022 that may be closed by this pull request

TikTok (tiktok.com) broken #30893

Open

dirkf force-pushed the df-wranai-tiktok-patch branch 2 times, most recently from adec287 to 99a2b7c Compare April 25, 2022 14:19

Fix TT blocking

2f65e20

Add TikTokVM Partial fix for TikTokUser

dirkf force-pushed the df-wranai-tiktok-patch branch from 99a2b7c to 2f65e20 Compare April 25, 2022 14:25

dirkf mentioned this pull request Apr 26, 2022

Can't download tiktok video yt-dlp/yt-dlp#3551

Closed

7 tasks

sulyi mentioned this pull request May 3, 2022

Tiktok sigi yt-dlp/yt-dlp#3624

Closed

14 tasks

afterdelight reviewed May 5, 2022

View reviewed changes

dirkf mentioned this pull request May 18, 2022

[tiktok:user] Failed to parse JSON yt-dlp/yt-dlp#3776

Closed

7 tasks

SuperSonicHub1 mentioned this pull request Jun 1, 2022

Inability to process request SuperSonicHub1/TikTok-RSS#7

Closed

pukkandan reviewed Jun 14, 2022

View reviewed changes

pukkandan added a commit to yt-dlp/yt-dlp that referenced this pull request Jun 17, 2022

[extractor/tiktok] Extract SIGI_STATE

a39a7ba

Based on #3624, ytdl-org/youtube-dl#30479 Closes #3551 Authored by dirkf, sulyi, pukkandan

dirkf mentioned this pull request Aug 9, 2022

TikTok Videos won't download #31150

Closed

5 tasks

dirkf linked an issue Aug 9, 2022 that may be closed by this pull request

TikTok Videos won't download #31150

Closed

5 tasks

dirkf force-pushed the master branch from 01bf89e to 4c6fba3 Compare August 26, 2022 07:51

dirkf mentioned this pull request Apr 9, 2023

tiktok #32021

Closed

dirkf mentioned this pull request Jul 24, 2023

TikTok videos #32468

Open

3 tasks

dirkf closed this Aug 5, 2023

dirkf added the defunct PR source branch is not accessible label Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TikTok] Support Sigi-type pages, etc #30479

[TikTok] Support Sigi-type pages, etc #30479

dirkf commented Jan 7, 2022 •

edited

Loading

dirkf commented Jan 7, 2022

hessijames79 commented Jan 18, 2022

afterdelight commented May 2, 2022

afterdelight May 5, 2022

pukkandan Jun 14, 2022

dirkf Jun 14, 2022 •

edited

Loading

pukkandan Jun 14, 2022

dirkf commented Jun 14, 2022

bvoq commented Dec 26, 2022

OwenMelbz commented Aug 4, 2023

kashif-umair commented Aug 4, 2023

[TikTok] Support Sigi-type pages, etc #30479

[TikTok] Support Sigi-type pages, etc #30479

Conversation

dirkf commented Jan 7, 2022 • edited Loading

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

dirkf commented Jan 7, 2022

hessijames79 commented Jan 18, 2022

afterdelight commented May 2, 2022

afterdelight May 5, 2022

Choose a reason for hiding this comment

pukkandan Jun 14, 2022

Choose a reason for hiding this comment

dirkf Jun 14, 2022 • edited Loading

Choose a reason for hiding this comment

pukkandan Jun 14, 2022

Choose a reason for hiding this comment

dirkf commented Jun 14, 2022

bvoq commented Dec 26, 2022

OwenMelbz commented Aug 4, 2023

kashif-umair commented Aug 4, 2023

dirkf commented Jan 7, 2022 •

edited

Loading

dirkf Jun 14, 2022 •

edited

Loading