Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate user made Innertube documentation #190

Open
Benjamin-Loison opened this issue Aug 16, 2023 · 8 comments
Open

Investigate user made Innertube documentation #190

Benjamin-Loison opened this issue Aug 16, 2023 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed low priority Nice to have feature. quick A task that should take less than two hours to complete.

Comments

@Benjamin-Loison
Copy link
Owner

Benjamin-Loison commented Aug 16, 2023

menmob/innertube-documentation/wiki/Decoding-Protobuf-Objects

menmob/innertube-documentation#1

May be able to simplify: https://stackoverflow.com/a/70013529

After manual review, I only found the two occurrences found by below command:

grep -r '\S\S=\S' --exclude-dir={tools,.git} --exclude=index.php | grep -vE 'checkRegex|UTF-8' | grep '='

Should use inner data and encode them instead of using an encoded blackbox.

$typeBase64 = $order === 'relevance' ? '' : 'EgIQAQ==';

$orderBase64 = 'EgZ2aWRlb3MYASAAMAE=';

Requested help on Discord, again.

@Benjamin-Loison Benjamin-Loison added enhancement New feature or request low priority Nice to have feature. quick A task that should take less than two hours to complete. labels Aug 16, 2023
@Benjamin-Loison Benjamin-Loison added the help wanted Extra attention is needed label Aug 22, 2023
@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Aug 22, 2023

Being able to build such base64 would possibly remove the need to fetch some webpages to build a continuation token for instance (that is sometimes required for some features without even talking end user developer pagination).

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Dec 20, 2023

https://www.protobufpal.com looks interesting.

syntax = "proto3";

message uint32Message
{
  uint32 exampleInte = 1;
}

message uint32StringUint32MessageInstance
{
  uint32 uint32Instance = 1;
  string stringInstance = 15;
  uint32Message uint32MessageInstance = 104;
}
CAF6BlBUOkNHUcIGAggA

Result:

{
  "uint32Instance": 1,
  "stringInstance": "PT:CGQ",
  "uint32MessageInstance": {
    "exampleInte": 0
  }
}

syntax = "proto3";

message threeStringsMessage
{
  string string0 = 2;
  string string1 = 3;
  string string2 = 35;
}

message threeStringsMessageMessage
{
  threeStringsMessage threeStringsMessage0 = 80226972;
}
4qmFsgJNEhpWTFVVQ3YxUGQyNG9QRXJ3NVM3ekpXbHRuURoUQ0FGNkJsQlVPa05IVWNJR0FnZ0GaAhhVVUN2MVBkMjRvUEVydzVTN3pKV2x0blE=

Result:

{
  "threeStringsMessage0": {
    "string0": "VLUUCv1Pd24oPErw5S7zJWltnQ",
    "string1": "CAF6BlBUOkNHUcIGAggA",
    "string2": "UUCv1Pd24oPErw5S7zJWltnQ"
  }
}

https://www.youtube.com/watch?v=mWdFMNQBcjs

Sort by Newest first (not necessary but ease this comprehension thanks to the comments themselves)
Note that it performs a new request as the Sort by button loads with the first comments. Also note that Newest first leads to shorter page tokens.

Face Error: Exhausted Buffer on:

Eg0SC21XZEZNTlFCY2pzGAYyhgEKXWdldF9uZXdlc3RfZmlyc3QtLUNnZ0lnQVFWRjdmUk9CSUZDSWNnR0FBU0JRaUlJQmdBRWdVSWlTQVlBQklGQ0owZ0dBRVlBQ0lPQ2d3STg1aXZuUVlRb04zRWxBSSIRIgttV2RGTU5RQmNqczABeAEoFEIQY29tbWVudHMtc2VjdGlvbg%3D%3D

which is the continuation token for the second page.

While the first is correctly treated:

Eg0SC21XZEZNTlFCY2pzGAYyJSIRIgttV2RGTU5RQmNqczABeAJCEGNvbW1lbnRzLXNlY3Rpb24%3D

Note that these tokens come from private navigation as they are less long (12 characters less long, and my account tokens still trigger this error) and it does not leak anything here.

Let us assume it is a CyberChef issue:

echo 'Eg0SC21XZEZNTlFCY2pzGAYyJSIRIgttV2RGTU5RQmNqczABeAJCEGNvbW1lbnRzLXNlY3Rpb24=' | base64 -d | protoc --decode_raw
2 {
  2: "mWdFMNQBcjs"
}
3: 6
6 {
  4 {
    4: "mWdFMNQBcjs"
    6: 1
    15: 2
  }
  8: "comments-section"
}

with the second page token I get:

2 {
  2: "mWdFMNQBcjs"
}
3: 6
6 {
  1: "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI"
  4 {
    4: "mWdFMNQBcjs"
    6: 1
    15: 1
  }
  5: 20
  8: "comments-section"
}
CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI

No matter if keep as is or add one or two = base64 detects it as an base64: invalid input and when passed to protoc get Failed to parse input.. Note that I also tried these commands with the third page token leading to the same results. Hence can consider it as a black box and try to reuse it. Will also have to try reducing field values or even remove fields to ease requests.
Now with the Newest first I do not have this issue first but get unfortunately what YouTube Data API v3 gives us below:

1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4103
  3: 0
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4125
  3: 1
}
3: 0
4 {
  1 {
    1: 1672203379
    2: 579940000
  }
}

Hence it is a CyberChef issue that I reported here.

With the third page token:

Eg0SC21XZEZNTlFCY2pzGAYyhgEKXWdldF9uZXdlc3RfZmlyc3QtLUNnZ0lnQVFWRjdmUk9CSUZDSWdnR0FBU0JRaWRJQmdCRWdVSWlTQVlBQklGQ0ljZ0dBQVlBQ0lPQ2d3STY3ZXVuUVlRc0lTRzF3SSIRIgttV2RGTU5RQmNqczABeAEoKEIQY29tbWVudHMtc2VjdGlvbg%3D%3D

I get:

2 {
  2: "mWdFMNQBcjs"
}
3: 6
6 {
  1: "get_newest_first--CggIgAQVF7fROBIFCIggGAASBQidIBgBEgUIiSAYABIFCIcgGAAYACIOCgwI67eunQYQsISG1wI"
  4 {
    4: "mWdFMNQBcjs"
    6: 1
    15: 1
  }
  5: 40
  8: "comments-section"
}
1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4125
  3: 1
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4103
  3: 0
}
3: 0
4 {
  1 {
    1: 1672190955
    2: 719422000
  }
}

Trying to forge:

syntax = "proto3";

message message0
{
  string string0 = 2;
}

message message2
{
  string string0 = 4;
  uint32 uint320 = 6;
  uint32 uint321 = 15;
}

message message1
{
  string string0 = 1;
  message2 message2Instance = 4;
  uint32 uint320 = 5;
  string string1 = 8;
}

message completeMessage
{
  message0 message0Instance = 2;
  uint32 uint320 = 3;
  message1 message1Instance = 6;
}
2 {
  2: "mWdFMNQBcjs"
}
3: 6
6 {
  1: "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI"
  4 {
    4: "mWdFMNQBcjs"
    6: 1
    15: 1
  }
  5: 20
  8: "comments-section"
}
{
  "2": {
    "2": "mWdFMNQBcjs"
  },
  "3": 6,
  "6": {
    "1": "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI",
    "4": {
      "4": "mWdFMNQBcjs",
      "6": 1,
      "15": 1
    },
    "5": 20,
    "8": "comments-section"
  }
}

If have issues can try with intermediate structures.

Create Template result:

{
  "message0Instance": {
    "string0": ""
  },
  "uint320": 0,
  "message1Instance": {
    "string0": "",
    "message2Instance": {
      "string0": "",
      "uint320": 0,
      "uint321": 0
    },
    "uint320": 0,
    "string1": ""
  }
}

Filled it like so:

{
  "message0Instance": {
    "string0": "mWdFMNQBcjs"
  },
  "uint320": 6,
  "message1Instance": {
    "string0": "get_newest_first--CggIgAQVF7fROBIFCIcgGAASBQiIIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwI85ivnQYQoN3ElAI",
    "message2Instance": {
      "string0": "mWdFMNQBcjs",
      "uint320": 1,
      "uint321": 1
    },
    "uint320": 20,
    "string1": "comments-section"
  }
}

Unclear difference with above but now it achieves encoding:

Eg0SC21XZEZNTlFCY2pzGAYyhgEKXWdldF9uZXdlc3RfZmlyc3QtLUNnZ0lnQVFWRjdmUk9CSUZDSWNnR0FBU0JRaUlJQmdBRWdVSWlTQVlBQklGQ0owZ0dBRVlBQ0lPQ2d3STg1aXZuUVlRb04zRWxBSSIRIgttV2RGTU5RQmNqczABeAEoFEIQY29tbWVudHMtc2VjdGlvbg==

In fact it seems that the key names have to match to have the encoding.
Note that if replace = with %3D I found back the page token I mentioned above, hence the encoding is correct.

curl -s https://www.youtube.com/youtubei/v1/next -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20231214.06.00"}}, "continuation": "Eg0SC21XZEZNTlFCY2pzGAYyhgEKXWdldF9uZXdlc3RfZmlyc3QtLUNnZ0lnQVFWRjdmUk9CSUZDSWNnR0FBU0JRaUlJQmdBRWdVSWlTQVlBQklGQ0owZ0dBRVlBQ0lPQ2d3STg1aXZuUVlRb04zRWxBSSIRIgttV2RGTU5RQmNqczABeAEoFEIQY29tbWVudHMtc2VjdGlvbg%3D%3D"}' | jq .onResponseReceivedEndpoints[0].appendContinuationItemsAction.continuationItems[].commentThreadRenderer.comment.commentRenderer.contentText.runs[0].text
"Twenty-four"
"Twenty-three"
"Twenty-two"
"Twenty-one"
"Twenty"
"Nineteen"
"Eighteen"
"Seventeen"
"Sixteen"
"Fifteen"
"Fourteen"
"Thirteen"
"Twelve"
"Eleven"
"Ten"
"Nine"
"Eight"
"Seven"
"Six"
"Five"
null

Modifying 20 for 21 or 40 results in the same result, hence understanding below YouTube Data API v3 page token seems mandatory.

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Dec 20, 2023

Let us investigate YouTube Data API v3 page token (as we can more rely on YouTube Data API v3 and its pageTokens look smaller) and building such pageTokens (then we will try forging arbitrary token):

https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2

UgxiJgRY2RSHNWi4ZrJ4AaABAg Ten
UgxglBKUbWwv4BYIiZt4AaABAg Nine
Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNKMGdHQUVTQlFpSklCZ0FFZ1VJaUNBWUFCSUZDSWNnR0FBWUFDSU9DZ3dJeTZ1SnJBWVFnS3ZNemdF
echo 'Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNKMGdHQUVTQlFpSklCZ0FFZ1VJaUNBWUFCSUZDSWNnR0FBWUFDSU9DZ3dJeTZ1SnJBWVFnS3ZNemdF' | base64 -d
get_newest_first--CggIgAQVF7fROBIFCJ0gGAESBQiJIBgAEgUIiCAYABIFCIcgGAAYACIOCgwIy6uJrAYQgKvMzgE
echo 'CggIgAQVF7fROBIFCJ0gGAESBQiJIBgAEgUIiCAYABIFCIcgGAAYACIOCgwIy6uJrAYQgKvMzgE=' | base64 -d | protoc --decode_raw
1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4125
  3: 1
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4103
  3: 0
}
3: 0
4 {
  1 {
    1: 1703040459
    2: 433264000
  }
}

https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2&pageToken=Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNKMGdHQUVTQlFpSklCZ0FFZ1VJaUNBWUFCSUZDSWNnR0FBWUFDSU9DZ3dJeTZ1SnJBWVFnS3ZNemdF

Ugwj6vF-f8FHODekFxV4AaABAg Eight
UgznmWtnKV8QDGwNlVx4AaABAg Seven
Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJa2dHQUFTQlFpSElCZ0FFZ1VJaUNBWUFCSUZDSjBnR0FFWUFDSU9DZ3dJdDZ1SnJBWVEwT2phM1FJ
get_newest_first--CggIgAQVF7fROBIFCIkgGAASBQiHIBgAEgUIiCAYABIFCJ0gGAEYACIOCgwIt6uJrAYQ0Oja3QI
1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4103
  3: 0
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4125
  3: 1
}
3: 0
4 {
  1 {
    1: 1703040439
    2: 733394000
  }
}

https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2&pageToken=Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJa2dHQUFTQlFpSElCZ0FFZ1VJaUNBWUFCSUZDSjBnR0FFWUFDSU9DZ3dJdDZ1SnJBWVEwT2phM1FJ

Ugz8lfM20h_gbo4geJx4AaABAg Six
Ugx4R530Lp2vWKfUTl94AaABAg Five
Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJZ2dHQUFTQlFpSElCZ0FFZ1VJaVNBWUFCSUZDSjBnR0FFWUFDSU9DZ3dJc3F1SnJBWVFvTDZVM3dJ
get_newest_first--CggIgAQVF7fROBIFCIggGAASBQiHIBgAEgUIiSAYABIFCJ0gGAEYACIOCgwIsquJrAYQoL6U3wI
1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4103
  3: 0
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4125
  3: 1
}
3: 0
4 {
  1 {
    1: 1703040434
    2: 736436000
  }
}

Between these protobufs:

2 {
  1: 4125
  3: 1
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4103
  3: 0
}

have its components order changed.

Only:

4 {
  1 {
    1: 1703040439
    2: 733394000
  }
}

change. However, 733394000 changes are unclear while the other field just seems to be a timestamp.

Hence, working with YouTube UI seems more appropriate for the moment.

Note that there are no interesting results with DuckDuckGo and Google search engines for:

  • 433264000
  • 733394000
  • 736436000
  • 630346000

Same for get_newest_first and get_ranked_streams (when sort by relevance).


Have different nextPageTokens if run twice the same request:

$ curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken | base64 -d | sed 's/get_newest_first--//g' | base64 -d | protoc --decode_raw
base64: invalid input
1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4103
  3: 0
}
2 {
  1: 4125
  3: 1
}
3: 0
4 {
  1 {
    1: 1703040459
    2: 433264000
  }
}

It indeed seem that only the order of 4 keys is differing:

for i in {0..100}; do curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken >> nextPageTokens.txt; done
cat nextPageTokens.txt | sort | uniq | wc -l

As get 24 and 24 is 4! = 4 * 3 * 2 * 1.

Maybe the field 4/1/2 is something like the range of the results retrieved.

Is it always the same value for the same request? Yes.

curl -s 'https://yt.lemnoslife.com/noKey/commentThreads?part=snippet&videoId=2aamcJeIvEg&maxResults=2' | jq -r .nextPageToken | base64 -d | sed 's/get_newest_first--//g' | base64 -d | protoc --decode_raw | tail -n 3 | head -n 1 # How to get the value by its path?

@Benjamin-Loison
Copy link
Owner Author

bin(733394000) # '0b101011101101101011010001010000'
len(bin(733394000)) # 32

>>> bin(955170000)
'0b111000111011101011110011010000'

>>> bin(100)
'0b1100100'

>>> bin(233215000)
'0b1101111001101001010000011000'

>>> bin(99)
'0b1100011'

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Dec 21, 2023

Just a copy of what I wrote there:

I quite deeply investigated trying to do so but still do not achieve to do so. Let me explain my question below:

When considering the nextPageToken provided by YouTube Data API v3 CommentThreads: list endpoint for video id tnTPaLOaHz8 with maxResults=96 I get:

"nextPageToken": "Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJZ2dHQUFTQlFpZElCZ0JFZ1VJaVNBWUFCSUZDSWNnR0FBU0JRaWVJQmdBR0FBaURRb0xDTVR0akt3R0VORE5pbGM="
echo 'Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJZ2dHQUFTQlFpZElCZ0JFZ1VJaVNBWUFCSUZDSWNnR0FBU0JRaWVJQmdBR0FBaURRb0xDTVR0akt3R0VORE5pbGM=' | base64 -d

gives:

get_newest_first--CggIgAQVF7fROBIFCIggGAASBQidIBgBEgUIiSAYABIFCIcgGAASBQieIBgAGAAiDQoLCMTtjKwGENDNilc

Hence:

echo 'CggIgAQVF7fROBIFCIggGAASBQidIBgBEgUIiSAYABIFCIcgGAASBQieIBgAGAAiDQoLCMTtjKwGENDNilc=' | base64 -d | protoc --decode_raw

results in:

1 {
  1: 512
  2: 0x38d1b717
}
2 {
  1: 4104
  3: 0
}
2 {
  1: 4125
  3: 1
}
2 {
  1: 4105
  3: 0
}
2 {
  1: 4103
  3: 0
}
2 {
  1: 4126
  3: 0
}
3: 0
4 {
  1 {
    1: 1703098052
    2: 182626000
  }
}

Note that I had to append = to make base64 not output base64: invalid input hence protoc Failed to parse input..

From my tests to get a random comment (denoted by an index) of a given video, 182626000 seems to be the value to change to force the wanted random comment index. While I know that protobuf is ambiguous I would really love to get a meaningful description of what stands behind this 182626000 to force the value that I want. Do you have any idea how to proceed?

My goal is to avoid a linear complexity in the number of comments of the video. For instance for a video with 100,000 comments I randomly generate 42,327 but would like to get the 42,327th comment without having to go through 424 pages of 100 results.

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Dec 21, 2023

Let us verify the theoretical approach with picking a random video in a playlist:

Playlist id: UUWeg2Pkate69NFdBeuRFTAw

After scrolling first 100 videos returned with the initial HTML got the continuation token:

4qmFsgJtEhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo0Q0FGNkkxQlVPa05IVVdsRlJHc3lUbXBKTlUxVVNYbFJhbEV5VWxWR1IxRlZVVzlCVmtGQ5oCGFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdw%3D%3D
$ echo '4qmFsgJtEhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo0Q0FGNkkxQlVPa05IVVdsRlJHc3lUbXBKTlUxVVNYbFJhbEV5VWxWR1IxRlZVVzlCVmtGQ5oCGFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdw==' | base64 -d | protoc --decode_raw
80226972 {
  2: "VLUUWeg2Pkate69NFdBeuRFTAw"
  3: "CAF6I1BUOkNHUWlFRGsyTmpJNU1USXlRalEyUlVGR1FVUW9BVkFC"
  35: "UUWeg2Pkate69NFdBeuRFTAw"
}
echo 'CAF6I1BUOkNHUWlFRGsyTmpJNU1USXlRalEyUlVGR1FVUW9BVkFC' | base64 -d | protoc --decode_raw
1: 1
15: "PT:CGQiEDk2NjI5MTIyQjQ2RUFGQUQoAVAB"
echo 'CGQiEDk2NjI5MTIyQjQ2RUFGQUQoAVAB' | base64 -d | protoc --decode_raw
1: 100
4: "96629122B46EAFAD"
5: 1
10: 1
syntax = "proto3";

message message
{
  uint32 uint320 = 1;
  string string0 = 4;
  uint32 uint321 = 5;
  uint32 uint322 = 10;
}

With:

{
  "uint320": 100,
  "string0": "96629122B46EAFAD",
  "uint321": 1,
  "uint322": 1
}

I found correctly:

CGQiEDk2NjI5MTIyQjQ2RUFGQUQoAVAB

Hence let us use:

{
  "uint320": 142,
  "string0": "96629122B46EAFAD",
  "uint321": 1,
  "uint322": 1
}
CI4BIhA5NjYyOTEyMkI0NkVBRkFEKAFQAQ==
syntax = "proto3";

message message
{
  uint32 uint320 = 1;
  string string0 = 15;
}
{
  "uint320": 1,
  "string0": "PT:CI4BIhA5NjYyOTEyMkI0NkVBRkFEKAFQAQ=="
}
CAF6J1BUOkNJNEJJaEE1TmpZeU9URXlNa0kwTmtWQlJrRkVLQUZRQVE9PQ==
syntax = "proto3";

message message0
{
  string string0 = 2;
  string string1 = 3;
  string string2 = 35;
}

message message1
{
  message0 message0 = 80226972;
}
{
  "message0": {
    "string0": "VLUUWeg2Pkate69NFdBeuRFTAw",
    "string1": "CAF6J1BUOkNJNEJJaEE1TmpZeU9URXlNa0kwTmtWQlJrRkVLQUZRQVE9PQ==",
    "string2": "UUWeg2Pkate69NFdBeuRFTAw"
  }
}
4qmFsgJ1EhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo8Q0FGNkoxQlVPa05KTkVKSmFFRTFUbXBaZVU5VVJYbE5hMGt3VG10V1FsSnJSa1ZMUVVaUlFWRTlQUT09mgIYVVVXZWcyUGthdGU2OU5GZEJldVJGVEF3

When using it in:

curl https://www.youtube.com/youtubei/v1/browse -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20231219.04.00"}}, "continuation": "4qmFsgJ1EhpWTFVVV2VnMlBrYXRlNjlORmRCZXVSRlRBdxo8Q0FGNkoxQlVPa05KTkVKSmFFRTFUbXBaZVU5VVJYbE5hMGt3VG10V1FsSnJSa1ZMUVVaUlFWRTlQUT09mgIYVVVXZWcyUGthdGU2OU5GZEJldVJGVEF3"}'

The first result is:

Screen Shot 2023-12-21 at 02 42 11

Hence the theoretical approach works. Note that it is 143 and not 142, as we have 143 by starting counting at 1 while we have 142 when starting counting at 0.

Could proceed with YouTube Data API v3 to reduce errors, as it seems to only be the PT part.

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Mar 12, 2024

To have clear PHP code I tried leveraging protobuf-php/protobuf but I face some issues (Benjamin_Loison/protobuf/issues and Benjamin_Loison/protobuf-plugin/issues). In addition that this library would be maybe overkill.

No interesting DuckDuckGo and Google results for "blackboxprotobuf" "php".

grep -rw 'function serializeToString'
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    public function serializeToString()
grep -r 'function serialize'
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    private function serializeSingularFieldToStream($field, &$output)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    private function serializeRepeatedFieldToStream($field, &$output)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    private function serializeMapFieldToStream($field, $output)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    private function serializeFieldToStream(&$output, $field)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    private function serializeFieldToJsonStream(&$output, $field)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    public function serializeToStream(&$output)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    public function serializeToJsonStream(&$output)
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    public function serializeToString()
vendor/google/protobuf/src/Google/Protobuf/Internal/Message.php:    public function serializeToJsonString()
vendor/google/protobuf/src/Google/Protobuf/Internal/GPBWire.php:    public static function serializeFieldToStream(
vendor/google/protobuf/src/Google/Protobuf/Internal/GPBJsonWire.php:    public static function serializeFieldToStream(
vendor/google/protobuf/src/Google/Protobuf/Internal/GPBJsonWire.php:    public static function serializeFieldValueToStream(
vendor/google/protobuf/src/Google/Protobuf/Internal/GPBJsonWire.php:    private static function serializeSingularFieldValueToStream(

So no Base64 it seems.

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Mar 29, 2024

<?php

require_once __DIR__ . '/vendor/autoload.php';

include_once 'generated/BlogPost.php';
include_once 'generated/GPBMetadata/BlogPost.php';

$blogSpot = new \BlogPost();
$blogSpot
    ->setTitle('Mon super billet');

// le contenu sérialisé en binaire
$binary = $blogSpot->serializeToString();
echo base64_encode($binary);

src/BlogPost.proto:

syntax = "proto3";

message BlogPost {
  string title = 2;
}
protoc --php_out=./generated --proto_path=src $(find src -name '*.proto')

works as wanted.

There are many retrieved and sent base64 Protobuf, constructing them to understand and simplify them and the process looks more appropriate.

Trying to simplify visitorData of channels?part=shorts:

curl -s 'http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO' | jq '.items[0].shorts[].videoId'
curl -s "http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO&pageToken=`curl -s 'http://localhost/YouTube-operational-API/channels?part=shorts&handle=@WHO' | jq -r '.items[0].nextPageToken'`" | jq '.items[0].shorts[].videoId'
echo -n 'CgtqajNudnAwZzBBOCiv7JuwBjIOCgJGUhIIEgQSAgsMIEU=' | base64 -d | protoc --decode_raw
1: "jj3nvp0g0A8"
5: 1711732271
6 {
  1: "FR"
  2 {
    2 {
      2 {
        1 {
        }
      }
    }
    4: 69
  }
}

visitorData looks useless in fact in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed low priority Nice to have feature. quick A task that should take less than two hours to complete.
Projects
None yet
Development

No branches or pull requests

1 participant