-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend minimizeCURL.py
to minimize Protobuf
#256
Comments
Let us try to proceed by hand first for a web-scraping usage: https://www.youtube.com/playlist?list=UUWeg2Pkate69NFdBeuRFTAw (with a private Firefox window) minimizeCURL curl.sh 'XrruklOv8X0' Note that minimizing on field name curl https://www.youtube.com/youtubei/v1/browse -H 'Content-Type: application/json' --data-raw '{"context": {"client": {"clientName": "WEB", "clientVersion": "2.20240313.05.00"}}, "continuation": "4qmFsgLmARIaVkxVVVdlZzJQa2F0ZTY5TkZkQmV1UkZUQXcarAFDQUY2ZlZCVU9rTkhVV2xGUkU1RVQwUlpNRTlWVlhoT1ZGVXlUMFJyTVZFd1VXOUJWV3BvYkdaTFpEaFFkVVZCTVVGQ1YycG5hVkV5YUc5V2JGcFhXa2Q0WVdWcmNGSlpWRXBIVFVad1ZWZFVWbFZoTVhCeVZWY3hWMDFXVm5KWGJGWlNWMGRPVkZKRlJuQk5WRTAxWlZoYVEyRkZVblpOYlZFeVYwVkdia2xumgIYVVVXZWcyUGthdGU2OU5GZEJldVJGVEF3"}' Python script using hardcoded
|
Let us proceed by hand for a YouTube Data API v3 usage now: https://www.youtube.com/playlist?list=UUWeg2Pkate69NFdBeuRFTAw Retrieve first YouTube Data API v3
|
Can avoid multiple |
If have a decoding base64 error, do not pay attention to it and protobuf decode anyway thanks to In fact:
requires Unclear how to minimize the following, as even with
|
Note that |
Even with considering double base64 encoding I am unable to make progress. |
Could integrate to |
Maybe first having an explicit request generator not relying on Base64 black box would be a good start. |
minimizeCURL curl.sh 'ADHD Test'
Python script with hardcoded
|
Python script:import base64
import blackboxprotobuf
import urllib.parse as ul
import copy
import binascii
def getBase64Protobuf(message, typedef):
data = blackboxprotobuf.encode_message(message, typedef)
return base64.b64encode(data).decode('ascii')
def isRequestStillFine(httpMethod, url, params, headers, data, needle):
data = httpMethod(url, params = params, headers = headers, json = data).json()
dataStr = json.dumps(data, indent = 4)
#print(dataStr)
return isDataOnlyContainingShorts(data)
def isRequestStillFineExplicit(httpMethod, url, params, headers, data, dataPath, message, typedef, needle):
setDataFromPath(data, dataPath, getBase64Protobuf(message, typedef))
return isRequestStillFine(httpMethod, url, params, headers, data, needle)
# Will need to proceed recursively
def minimizeProtobuf(httpMethod, url, params, headers, data, dataPath, needle, messages = [], typedefs = []):
print(dataPath)
print(json.dumps(data, indent = 4))
entry = base64.b64decode(ul.unquote_plus(getDataFromPath(data, dataPath)), altchars = '-_')
message, typedef = blackboxprotobuf.decode_message(entry)
#print(json.dumps(message, indent = 4))
print(json.dumps(typedef, indent = 4))
# Based on [YouTube-operational-API/blob/11566147f4d54b8d8d8481709fd5bf6b1329f4de/tools/minimizeCURL.py](https://github.com/Benjamin-Loison/YouTube-operational-API/blob/11566147f4d54b8d8d8481709fd5bf6b1329f4de/tools/minimizeCURL.py) `isJson`.
def getPaths(d):
if isinstance(d, dict):
for key, value in d.items():
yield f'/{key}'
yield from (f'/{key}{p}' for p in getPaths(value))
# If a single unknown entry is necessary, then this algorithm seems to most efficiently goes from parents to children if necessary to remove other entries. Hence, it seems to proceed in a linear number of HTTPS requests and not a quadratic one.
# Try until no more change to remove unnecessary entries. If assume a logical behavior as just mentioned, would not a single loop iteration be enough? Not with current design, see (1).
while True:
changedSomething = False
# Note that the path goes from parents to children if necessary which is quite a wanted behavior to quickly remove useless chunks.
paths = getPaths(message)
# For all entries, copy current `rawData` and try to remove an entry.
for path in paths:
# Copy current `rawData`.
messageCopy = copy.deepcopy(message)
# Remove an entry.
# Pay attention that integer keys here are .
entry = messageCopy
pathParts = path[1:].split('/')
for pathPart in pathParts[:-1]:
entry = entry[pathPart]
lastPathPart = pathParts[-1]
del entry[lastPathPart]
# Test if the removed entry was necessary.
# (1) If it was unnecessary, then reconsider paths excluding possible children paths of this unnecessary entry, ensuring optimized complexity it seems.
if isRequestStillFineExplicit(httpMethod, url, params, headers, data, dataPath, messageCopy, typedef, needle):
print(len(json.dumps(data)), 'still fine')
changedSomething = True
message = messageCopy
break
# If it was necessary, we consider possible children paths of this necessary entry and other paths.
# If a loop iteration considering all paths, does not change anything, then the request cannot be minimized further.
if not changedSomething:
break
# Maybe minimize `typedef` once have minimized `message`. Especially as `field_order` can be removed if only know that do not need other entries.
# However, can postpone implementing such minimization, as minimizing `typedef` once have minimized `message` is quick.
messages += [message]
typedefs += [typedef]
paths = getPaths(message)
for path in paths:
leaf = getDataFromPath(message, path)
# To avoid intermediary nodes.
if type(leaf) is str:
try:
base64.b64decode(ul.unquote_plus(leaf))
print(path)
setDataFromPath(message, path, f'_{pathPart}')
messagesRecursive, typedefsRecursive = minimizeProtobuf(HTTP_METHOD, URL, PARAMS, HEADERS, leaf, dataPath + path, NEEDLE)
messages += messagesRecursive
typedefs += typedefsRecursive
except binascii.Error:
pass
return messages, typedefs
def getDataFromPath(data, path):
pathParts = path[1:].split('/')
for pathPart in pathParts:
data = data[pathPart]
return data
def setDataFromPath(data, path, value):
pathParts = path[1:].split('/')
for pathPart in pathParts[:-1]:
data = data[pathPart]
lastPathPart = pathParts[-1]
data[lastPathPart] = value
HTTP_METHOD = requests.post
DATA_PATH = '/continuation'
messages, typedefs = minimizeProtobuf(HTTP_METHOD, URL, PARAMS, HEADERS, DATA, DATA_PATH, NEEDLE)
print(json.dumps(messages, indent = 4))
#print(json.dumps(typedef, indent = 4)) |
Similar to:YouTube-operational-API/tools/minimizeCURL.py Lines 151 to 228 in 0e4168e
isJson part.
Commenting this code seems to make sense.
|
Not simplified Protobuf
|
Simplified Protobuf
|
PARAMS = {
'prettyPrint': 'false',
} can be useful to check first item to be given one. |
Note that there are 2 paths, one inside |
It does not seem possible to easily simplify by hand recursively by recalling the minimizer with differents arguments. |
Related to #190, #69, #255 and Benjamin-Loison/cpython/issues/16.
+8
The text was updated successfully, but these errors were encountered: