-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate user made Innertube documentation #190
Comments
Being able to build such base64 would possibly remove the need to fetch some webpages to build a continuation token for instance (that is sometimes required for some features without even talking end user developer pagination). |
https://www.protobufpal.com looks interesting. syntax = "proto3";
message uint32Message
{
uint32 exampleInte = 1;
}
message uint32StringUint32MessageInstance
{
uint32 uint32Instance = 1;
string stringInstance = 15;
uint32Message uint32MessageInstance = 104;
}
Result: {
"uint32Instance": 1,
"stringInstance": "PT:CGQ",
"uint32MessageInstance": {
"exampleInte": 0
}
} syntax = "proto3";
message threeStringsMessage
{
string string0 = 2;
string string1 = 3;
string string2 = 35;
}
message threeStringsMessageMessage
{
threeStringsMessage threeStringsMessage0 = 80226972;
}
Result: {
"threeStringsMessage0": {
"string0": "VLUUCv1Pd24oPErw5S7zJWltnQ",
"string1": "CAF6BlBUOkNHUcIGAggA",
"string2": "UUCv1Pd24oPErw5S7zJWltnQ"
}
} https://www.youtube.com/watch?v=mWdFMNQBcjs
Face
which is the continuation token for the second page. While the first is correctly treated:
Note that these tokens come from private navigation as they are less long (12 characters less long, and my account tokens still trigger this error) and it does not leak anything here. Let us assume it is a CyberChef issue: echo 'Eg0SC21XZEZNTlFCY2pzGAYyJSIRIgttV2RGTU5RQmNqczABeAJCEGNvbW1lbnRzLXNlY3Rpb24=' | base64 -d | protoc --decode_raw
with the second page token I get:
Hence it is a CyberChef issue that I reported here. With the third page token:
I get:
Trying to forge: syntax = "proto3";
message message0
{
string string0 = 2;
}
message message2
{
string string0 = 4;
uint32 uint320 = 6;
uint32 uint321 = 15;
}
message message1
{
string string0 = 1;
message2 message2Instance = 4;
uint32 uint320 = 5;
string string1 = 8;
}
message completeMessage
{
message0 message0Instance = 2;
uint32 uint320 = 3;
message1 message1Instance = 6;
}
{
"2": {
"2": "mWdFMNQBcjs"
},
"3": 6,
"6": {
"1": "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI",
"4": {
"4": "mWdFMNQBcjs",
"6": 1,
"15": 1
},
"5": 20,
"8": "comments-section"
}
} If have issues can try with intermediate structures.
{
"message0Instance": {
"string0": ""
},
"uint320": 0,
"message1Instance": {
"string0": "",
"message2Instance": {
"string0": "",
"uint320": 0,
"uint321": 0
},
"uint320": 0,
"string1": ""
}
} Filled it like so: {
"message0Instance": {
"string0": "mWdFMNQBcjs"
},
"uint320": 6,
"message1Instance": {
"string0": "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI",
"message2Instance": {
"string0": "mWdFMNQBcjs",
"uint320": 1,
"uint321": 1
},
"uint320": 20,
"string1": "comments-section"
}
} Unclear difference with above but now it achieves encoding:
In fact it seems that the key names have to match to have the encoding. curl -s https://www.youtube.com/youtubei/v1/next -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20231214.06.00"}}, "continuation": "Eg0SC21XZEZNTlFCY2pzGAYyhgEKXWdldF9uZXdlc3RfZmlyc3QtLUNnZ0lnQVFWRjdmUk9CSUZDSWNnR0FBU0JRaUlJQmdBRWdVSWlTQVlBQklGQ0owZ0dBRVlBQ0lPQ2d3STg1aXZuUVlRb04zRWxBSSIRIgttV2RGTU5RQmNqczABeAEoFEIQY29tbWVudHMtc2VjdGlvbg%3D%3D"}' | jq .onResponseReceivedEndpoints[0].appendContinuationItemsAction.continuationItems[].commentThreadRenderer.comment.commentRenderer.contentText.runs[0].text
Modifying |
Let us investigate YouTube Data API v3 page token (as we can more rely on YouTube Data API v3 and its https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2
echo 'Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNKMGdHQUVTQlFpSklCZ0FFZ1VJaUNBWUFCSUZDSWNnR0FBWUFDSU9DZ3dJeTZ1SnJBWVFnS3ZNemdF' | base64 -d
echo 'CggIgAQVF7fROBIFCJ0gGAESBQiJIBgAEgUIiCAYABIFCIcgGAAYACIOCgwIy6uJrAYQgKvMzgE=' | base64 -d | protoc --decode_raw
Between these protobufs:
have its components order changed. Only:
change. However, Hence, working with YouTube UI seems more appropriate for the moment. Note that there are no interesting results with DuckDuckGo and Google search engines for:
Same for Have different $ curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken | base64 -d | sed 's/get_newest_first--//g' | base64 -d | protoc --decode_raw
It indeed seem that only the order of for i in {0..100}; do curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken >> nextPageTokens.txt; done
cat nextPageTokens.txt | sort | uniq | wc -l As get Maybe the field Is it always the same value for the same request? Yes. curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken | base64 -d | sed 's/get_newest_first--//g' | base64 -d | protoc --decode_raw | tail -n 3 | head -n 1 # How to get the value by its path? |
bin(733394000) # '0b101011101101101011010001010000'
len(bin(733394000)) # 32
>>> bin(955170000)
'0b111000111011101011110011010000'
>>> bin(100)
'0b1100100'
>>> bin(233215000)
'0b1101111001101001010000011000'
>>> bin(99)
'0b1100011' |
Just a copy of what I wrote there:
|
Let us verify the theoretical approach with picking a random video in a playlist: Playlist id: After scrolling first 100 videos returned with the initial HTML got the
$ echo '4qmFsgJtEhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo0Q0FGNkkxQlVPa05IVVdsRlJHc3lUbXBKTlUxVVNYbFJhbEV5VWxWR1IxRlZVVzlCVmtGQ5oCGFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdw==' | base64 -d | protoc --decode_raw
echo 'CAF6I1BUOkNHUWlFRGsyTmpJNU1USXlRalEyUlVGR1FVUW9BVkFC' | base64 -d | protoc --decode_raw
echo 'CGQiEDk2NjI5MTIyQjQ2RUFGQUQoAVAB' | base64 -d | protoc --decode_raw
syntax = "proto3";
message message
{
uint32 uint320 = 1;
string string0 = 4;
uint32 uint321 = 5;
uint32 uint322 = 10;
} With: {
"uint320": 100,
"string0": "96629122B46EAFAD",
"uint321": 1,
"uint322": 1
} I found correctly:
Hence let us use: {
"uint320": 142,
"string0": "96629122B46EAFAD",
"uint321": 1,
"uint322": 1
}
syntax = "proto3";
message message
{
uint32 uint320 = 1;
string string0 = 15;
} {
"uint320": 1,
"string0": "PT:CI4BIhA5NjYyOTEyMkI0NkVBRkFEKAFQAQ=="
}
syntax = "proto3";
message message0
{
string string0 = 2;
string string1 = 3;
string string2 = 35;
}
message message1
{
message0 message0 = 80226972;
} {
"message0": {
"string0": "VLUUWeg2Pkate69NFdBeuRFTAw",
"string1": "CAF6J1BUOkNJNEJJaEE1TmpZeU9URXlNa0kwTmtWQlJrRkVLQUZRQVE9PQ==",
"string2": "UUWeg2Pkate69NFdBeuRFTAw"
}
}
When using it in: curl https://www.youtube.com/youtubei/v1/browse -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20231219.04.00"}}, "continuation": "4qmFsgJ1EhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo8Q0FGNkoxQlVPa05KTkVKSmFFRTFUbXBaZVU5VVJYbE5hMGt3VG10V1FsSnJSa1ZMUVVaUlFWRTlQUT09mgIYVVVXZWcyUGthdGU2OU5GZEJldVJGVEF3"}' The first result is: Hence the theoretical approach works. Note that it is 143 and not 142, as we have 143 by starting counting at 1 while we have 142 when starting counting at 0. Could proceed with YouTube Data API v3 to reduce errors, as it seems to only be the |
To have clear PHP code I tried leveraging protobuf-php/protobuf but I face some issues (Benjamin_Loison/protobuf/issues and Benjamin_Loison/protobuf-plugin/issues). In addition that this library would be maybe overkill. No interesting DuckDuckGo and Google results for grep -rw 'function serializeToString'
grep -r 'function serialize'
So no Base64 it seems. |
<?php
require_once __DIR__ . '/vendor/autoload.php';
include_once 'generated/BlogPost.php';
include_once 'generated/GPBMetadata/BlogPost.php';
$blogSpot = new \BlogPost();
$blogSpot
->setTitle('Mon super billet');
// le contenu sérialisé en binaire
$binary = $blogSpot->serializeToString();
echo base64_encode($binary);
syntax = "proto3";
message BlogPost {
string title = 2;
} protoc --php_out=./generated --proto_path=src $(find src -name '*.proto') works as wanted. There are many retrieved and sent base64 Protobuf, constructing them to understand and simplify them and the process looks more appropriate. Trying to simplify curl -s 'http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO' | jq '.items[0].shorts[].videoId' curl -s "http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO&pageToken=`curl -s 'http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO' | jq -r '.items[0].nextPageToken'`" | jq '.items[0].shorts[].videoId' echo -n 'CgtqajNudnAwZzBBOCiv7JuwBjIOCgJGUhIIEgQSAgsMIEU=' | base64 -d | protoc --decode_raw
|
menmob/innertube-documentation/wiki/Decoding-Protobuf-Objects
menmob/innertube-documentation#1
May be able to simplify: https://stackoverflow.com/a/70013529
After manual review, I only found the two occurrences found by below command:
Should use inner data and encode them instead of using an encoded blackbox.
YouTube-operational-API/search.php
Line 118 in 5a81929
YouTube-operational-API/search.php
Line 144 in 5a81929
Requested help on Discord, again.
The text was updated successfully, but these errors were encountered: