-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API versioning #162
Comments
Here is an example of YouTube data structure change providing more data going from an element to an array, how should we deal with that?
Any hybrid like keeping current data structure and adding a field with the (remaining) elements of the array seems stupid. Note that we don't have any control on YouTube UI data returned that mean any data can be removed at any point in time, so the stability provided by versioning isn't as guaranteed as usually. In fact passing from an element to the array was a bit predictable, even if I don't much agree with keeping TODO: don't forget to update accordingly my Stack Overflow answer once have done the necessary changes. Note that I'm in favor of per endpoint versioning as we would like to support in real time YouTube UI changes so how would this would look like while still making sense in the URL? TODO: should verify that yt-dlp having this feature is still working fine. Note that could return an array if we are aware of an array otherwise just the element but then it's up to the API user to manage both cases which isn't wanted, even if not doing so means that he isn't aware if there is a single element in the array if it's because we are sure that there is no other one (can we even be sure about it?). If we were able to know what YouTube UI we are facing then we could denote it in our response. Forcing some given YouTube UI nodes having specific YouTube versions could help but this is an unwanted solution as it may evolve more quickly than the YouTube UI itself. Should treat this specific issue with versioning as it's a kind of textbook case as it's the most appreciated feature. Previous messages to this one show that it's not just about multiple most-replayed segments but YouTube is rolling out a new update on its servers, so this issue is now very important. Assuming they have a linear update process on YouTube servers, having a field to figure out the remote server version would be nice, instead of on a per feature basis adapt our parsing. curl -s 'https://www.youtube.com/watch?v=o8NPllzkFhE' > video.json then open |
Following this comment, as I had once the licenses returned while all the other times not, the question was do a given YouTube server (identified by its IP) has a fixed version such that we can rely on it to have a given data structure. The answer is no. Then a question could be to always have correct structure parsing to be able to versioning the returned data structure, this question could be answered by comparing, preferably whole HTML, data structure returned between two different data structures to find a data structure version identifier. Could use this Stack Overflow answer to compare two JSON as a first step for instance (HTML comparison being a latter one). I am looking for a tool as Potential solution: find a cookie like the one previously used to get a channel shorts before it was vastly deployed on YouTube servers. Could try to find such a
clear && curl -s -v -H 'Cookie: __Secure-YEC=CgtuNjFmZlJlR0Qxcyjp3P-aBg==' 'https://www.youtube.com/watch?v=Ehoe35hTbuY' | grep 'WMG' returns While above is a manual try, having a web-browser with previous data structure and able to copy the whole cURL URL would help. import requests
url = 'https://www.youtube.com/watch?v=Ehoe35hTbuY'
requestIndex = 0
while True:
response = requests.get(url, stream = True)
ip = response.raw._fp.fp.raw._sock.getpeername()[0]
print(requestIndex, ip, 'Shape of You' in response.text)
text = response.text
if 'SonyATV' in text:
print('Found!')
index = text.index('SonyATV')
print(text[index - 200:index + 200])
break
requestIndex += 1
from seleniumwire import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
#options.add_argument('-headless')
options.set_preference('permissions.default.image', 2)
browser = webdriver.Firefox(options = options)
requestIndex = 0
while True:
browser.get('https://www.youtube.com/watch?v=Ehoe35hTbuY')
browsePageSource = browser.page_source
print(requestIndex, 'Shape of You' in browsePageSource)
if 'SonyATV' in browsePageSource:
print('Found!')
index = browsePageSource.index('SonyATV')
print(browsePageSource[index - 200:index + 200])
break
requestIndex += 1
browser.close()
Then it is stuck but after several tens of executions I have not found the licenses... Should verify above with something less likely to happen randomly (especially if make multiple requests) such as At least import requests
url = 'https://www.youtube.com/watch?v=Ehoe35hTbuY'
requestIndex = 0
while True:
response = requests.get(url, stream = True)
ip = response.raw._fp.fp.raw._sock.getpeername()[0]
print(requestIndex, ip, 'Shape of You' in response.text)
if ip != '2a00:1450:4007:80c::200e':
print('Different IP!')
break
requestIndex += 1
Reached by accumulation 192 with Selenium without finding licenses... So being able to craft a following cURL request from cURL working one seems to make the most sense. Understanding all response fields from the verbose curl could help. Maybe could also have more chance with phone mobile or desktop view of YouTube. No more luck. Raw note below: $ clear && curl -s -v 'https://www.youtube.com/watch?v=Ehoe35hTbuY' > a.json
$ clear && curl -s -v 'https://www.youtube.com/watch?v=Ehoe35hTbuY' > b.json && ./getJSONPathFromKey.py b.json | grep 'WMG' Unable to reproduce on any of my 7 YouTube operational API instances. Let us try to force the opposite protocol used by default on instance proposing it (only 2 instances, not enabling to reproduce). clear && curl -s -v 'https://www.youtube.com/watch?v=Ehoe35hTbuY' | grep 'WMG' Returns same hash multiple times in a row: $ nslookup www.youtube.com | grep Address | grep -v '127.0.0.' | cut -d' ' -f2 | sort | sha256sum So let us try all possible addresses exhaustively.
Note that For unknown reason this file containing After another time (with the same YouTube server IP) does not have |
Tried hand crafted: clear && curl -s -v 'https://www.youtube.com/watch?v=Ehoe35hTbuY' -H 'Cookie: __Secure-BUCKET=CPsB; YSC=skqAukUZG7c; __Secure-YEC=CgtGcWc1SmhiQk9FTSid2qqqBjIICgJGUhICEgA%3D; VISITOR_PRIVACY_METADATA=CgJGUhICEgA%3D; VISITOR_INFO1_LIVE=; CONSENT=PENDING+159' | grep 'SonyATV' minimized it to: $ minimizeCURL curl.txt 'SonyATV'
242
Removing headers
Removing URL parameters
Removing cookies
219 still fine
202 still fine
161 still fine
140 still fine
119 still fine
Removing raw data
curl 'https://www.youtube.com/watch?v=Ehoe35hTbuY' -H 'Cookie: __Secure-YEC=CgtGcWc1SmhiQk9FTSid2qqqBjIICgJGUhICEgA%3D' |
This shows that possibly storing |
Note that I do not know how much time this patch will work, as it seems to rely on a previous YouTube UI. Long investigation available [here](#162 (comment)).
Cf https://docs.github.com/en/rest/overview/api-versions
Related to
CITATION.cff
.The text was updated successfully, but these errors were encountered: