Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Protobuf in my Stack Overflow answer to retrieve video captions #262

Closed
Benjamin-Loison opened this issue Apr 8, 2024 · 2 comments
Closed
Assignees
Labels
enhancement New feature or request medium priority A high-priority issue noticeable by the user but he can still work around it. quick A task that should take less than two hours to complete.

Comments

@Benjamin-Loison
Copy link
Owner

Benjamin-Loison commented Apr 8, 2024

The concerned Stack Overflow answer 70013529.

As requested on Discord.

Concerning the first part, maybe I simplified by trying to remove characters one by one and now it is not correctly encoded (such that I am not able to formalize the associated encoding) but correctly decoded on Google servers.

Being aware of what parameters are actually necessary would be nice to minimize requests. Related to #256.

To verify correctness can use:

jq '.actions[0].updateEngagementPanelAction.content.transcriptRenderer.content.transcriptSearchPanelRenderer.body.transcriptSegmentListRenderer.initialSegments[].transcriptSegmentRenderer.snippet.runs[0].text'

Related to Benjamin_Loison/protobuf_google/issues/1.

As blackboxprotobuf does not have this issue, let us switch to Python.

@Benjamin-Loison Benjamin-Loison added enhancement New feature or request medium priority A high-priority issue noticeable by the user but he can still work around it. quick A task that should take less than two hours to complete. labels Apr 8, 2024
@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Apr 8, 2024

import requests
import blackboxprotobuf
import base64

def getBase64Protobuf(message, typedef):
    data = blackboxprotobuf.encode_message(message, typedef)
    return base64.b64encode(data).decode('ascii')

requestsParams = {
    'key': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8'
}
url = 'https://www.youtube.com/youtubei/v1/get_transcript'
headers = {
    'Content-Type': 'application/json'
}

message = {
    "1": "lo0X2ZdElQ4",
    "2": "CgASAnJ1GgA="
}

typedef = {
    "1": {
        "type": "string"
    },
    "2": {
        "type": "string"
    }
}

params = getBase64Protobuf(message, typedef)

data = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20240313.05.00'
        }
    },
    'params': params
}

data = requests.post(url, params = requestsParams, headers = headers, json = data).json()
print('развитием' in str(data))

@Benjamin-Loison
Copy link
Owner Author

Benjamin-Loison commented Apr 10, 2024

Concerning the first part, as requested on Discord:

curl https://www.youtube.com/youtubei/v1/get_transcript -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20240408.05.00"}}, "params": "CgtxUVY2Z3V2Rm1NMBISQ2dOaGMzSVNBbVZ1R2dBJTNEGAEqM2VuZ2FnZW1lbnQtcGFuZWwtc2VhcmNoYWJsZS10cmFuc2NyaXB0LXNlYXJjaC1wYW5lbDABOAFAAQ%3D%3D"}'
import requests
import blackboxprotobuf
import base64

def getBase64Protobuf(message, typedef):
    data = blackboxprotobuf.encode_message(message, typedef)
    return base64.b64encode(data).decode('ascii')

message = {
    '1': 'asr',
    '2': 'en',
}

typedef = {
    '1': {
        'type': 'string'
    },
    '2': {
        'type': 'string'
    },
}

two = getBase64Protobuf(message, typedef)

message = {
    '1': 'qQV6guvFmM0',
    '2': two,
}

typedef = {
    '1': {
        'type': 'string'
    },
    '2': {
        'type': 'string'
    },
}

params = getBase64Protobuf(message, typedef)

url = 'https://www.youtube.com/youtubei/v1/get_transcript'
headers = {
    'Content-Type': 'application/json'
}
data = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20240313'
        }
    },
    'params': params
}

data = requests.post(url, headers = headers, json = data).json()
print('this is Will Smith and I\'m waiting for' in str(data))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request medium priority A high-priority issue noticeable by the user but he can still work around it. quick A task that should take less than two hours to complete.
Projects
None yet
Development

No branches or pull requests

1 participant