-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for dynamic url #25
Comments
hello, the issue you are experiencing is due to the fact that multiple pypdl instances are trying to write to the console at the same time and overwriting each other's progress bar in fact the download should be happening properly without any issue if you are providing different path for each download, so pypdl should work well with threadpoolexecutor since pypdl under the hood also uses threadpoolexecutor and we are just trying to wrap around it, hence setting Now to fix the issue of progress bar we could disable the progress bar of each pypdl instance by setting import json
from seleniumbase import Driver
from pypdl import PypdlFactory
def get_download_streams(url):
driver = Driver(uc=True, log_cdp_events=True, devtools=True)
driver.get(url)
logs = driver.get_log("performance")
for log in logs:
log = json.loads(log["message"])
if log["message"]["method"] == "Network.responseReceived":
if log["message"]["params"]["response"]["mimeType"] == "video/mp4":
stream_url = log["message"]["params"]["response"]["url"]
return stream_url
tasks = []
# file will be saved to a folder called downloads (assuming its alreaded created)
for link in links:
tasks.append((lambda: get_download_streams(link), {"file_path": "downloads/"}))
# create a factory with 4 workers
factory = PypdlFactory(4)
x = factory.start(tasks) |
Thanks for the detailed response |
did it fix the issue? |
yeah all good, thanks again |
Hi, 1st of all awesome library, makes downloading so much more simple.
Quick thing, Any way to use Pypdl with ThreadPoolExecutor for concurrency?
i see the provided option using a list of predefined tasks/links to PypdlFactory. What if that information is dynamic so i cannot have that static list of URLS before hand, so im fetching the URLs i need 4 at a time using ThreadPoolExecutor. The function fetches the 4 download URLs concurrently and then im using Pypdl to download all 4 files same time and its kinda working, the multiple downloads just flash one on top of the other every second or less in the output.
Its not a major thing, was curious if there was a way to make it work more "cute" with ThreadPoolExecutor
Cheers again for the awesome library :)
The text was updated successfully, but these errors were encountered: