-
Notifications
You must be signed in to change notification settings - Fork 65
Downloader is failing, due to recent Rate Limiting update by Fansly #148
Comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I tried the executable and the most recent 0.4.2 python version of fansly downloader and It is still having the same rate-limiting problem as before. |
So it appears just after I initially bypassed the first introduction of rate-limiting, by switching back to the old fansly api endpoint for timeline downloads, they've also noticed it and adjusted their website code to apply the rate-limiting onto it too. This change happend just a few hours after I released fansly-downloaders 0.4.1-post1 version, which makes me think that they're now actively looking up the commit history of this downloader and are counter patching my changes 🤣 Anyways, so can you guys try out this branch of version 0.4.2 and see if it solves the rate-limiting issue again? Within that branch, fansly-downloader is just artifically slowed down to avoid hitting the rate-limit. I'm on a vacation for a few weeks, chilling on the beach, so I don't have access to a python environment (or a PC) and I won't make greater deeds to change that. Additionally I noticed they're introducing more variables / tokens for each request to the api endpoints, to further validate the requests, which their backend has to handle. If they've already added logging, to see which requests are not sending these new tokens, they're already at this point in time, able to tell, which requests came from 3rd party code like fansly-downloader (as in version 0.4.2 these tokens are still not replicated). It's also very possible that maybe only if these tokens are not sent the rate-limiting is applied, because last time I checked when scrolling around on their website it still instantly loads all media content which means that there's no rate limiting applied, it would require further testing, which I don't currently have the time for. |
Strangely, I don't always hit this rate-limit issue. Sometimes it goes all the way, and sometimes I get this: WARNING | 12:29 || Low amount of Pictures scraped. Creators total Pictures: 1683 | Downloaded: 300 Sometimes it downloads only 10 items, and sometimes it downloads thousands. Is it possible to slow down even further on our side (by allowing param level rate-limiting)? |
If it helps, I am using the forked (0.4.2) version. |
Another data point, after the failure I noted above, I tried a different creator, and it's been scraping for a while now (successfully). We'll see what the final count is when it's done. I will update back once it's complete. Update: The new run (for a different creator) ended successfully: Finished Normal type, download of 2911 pictures & 461 videos! Declined duplicates: 30 So, I'm puzzled as to why some creator scrapes are throttled, and others are not (especially when those that aren't sometimes have way more content). |
Can you try out this branch and let me know if it succesfully reliably passes the rate-limit all the time? |
Done. Tested multiple times. It does not successfully pass the rate-limit all the time. At least, there are some creators where it fails all the time. There are some where it passes 100% of the time. I'm not entirely sure why. |
Looks like the most efficient way to handle this would be a function that before starting timeline downloads, meassures if a rate-limit is even existing for a specific creator and depending on the result dynamically adjusts the wait time. Would be cool if someone contributed that, else I'll write it myself when I return from my vacation in a few weeks. But for now you might aswell just higher this sleep timer from 5, 6 to whenever it reliably passes the wait timer all the time with e.g. 7, 8 |
Thanks for the tip! I'll play around with the sleep timer and report back on my findings. |
Could you solve the problem? I can't tell reading your comments. |
The author says they will work on it when they return, and are also putting out a call for help for contributors to help with solving it and writing code. |
I set my sleep timer to 105,108 and it started working on an account that previously did not scrape much. It probably doesn't need to be that crazy but it's definitely an issue with the sleep timer. Edit: 72,75 worked but 52,55 did not work. |
I did something similar. I have it set to 120, 240 right now, and it's working on all the ones I shared above that failed previously (and consistently). Obviously taking forever. And not each one required 120.. So I'm not sure why some do and some don't. |
It is slow as hell, but yes it works ok-ish with 72,75. Thank you |
I finally encountered a creator that I cannot scrape with 120, 122. Doubling the numbers now to see if that helps (and yes, it'll take ages and ages). |
Confirmed: I have an example of a creator where no matter how high I set the delay, it still fails. |
same here, i can only scrap till jan.2023 everything older is failed. |
It has nothing to do with the age of the posts, it seems. I've had some that don't pull before August 2023, and some that don't pull before yesterday... and then some that pull 100% successfully. This is repeatable, so it's creator specific, it seems. Very confusing to me. |
i always recieve the error that there are no media on current cursor, i dont know what to change anymore XD |
I think I created a workaround for the rate limiting. I used the sleep function created above and added retry attempts after each sleep. If the program fails to pull posts from a timeline, it will wait X seconds then try to pull the same timeline. It seems to take 5-8 attempts, but it can take more sometimes. After it successfully pulls posts from the timeline, the number of retry attempts resets. I created a pull request with these changes, but I'm not sure what the process is for reviewing those changes. It's definitely not a perfect fix, but it seems to push through the rate limiting most of the time. |
can you please upload the part where you made this changes? |
Sure thing, I think you can check it out on my branch here. This is my first time trying to fork a branch in Github so please let me know if you can't get to it. There's also a pull request with the changes I made. As a side note, I played around with increasing the number of attempts and the timer. 20 attempts at 5-20 second intervals is slow, but it was able to go through a page with content going back to late 2021 in a few hours. |
After reading the stuff you guys said, I need to correct myself. Considering some of you need 70+ seconds wait timers, it would be more benefitial to just replicate whatever the fansly website is doing, as it obviously allows instantly scrolling around and loading media and as I was pointing out before, I've seen some newly introduced identifier/auth tokens the timeline requests have now, if I had to take a wild guess, they prolly introduced the requirement for a javascript backend which within a real browser creates those tokens for each timeline request before they're being sent and this way the rate limiting is just entirely not applied. Just replicating that with python and specific 3rd party libraries will most likely get rid of the need of having to wait so long for each request. Fansly devs, if you read this; I would be down to just keep static 5 second timers inbetween each request, but everything above that forces me to a propper replication, which will in return load up your servers with requests again. Down for a gentlemans agreement, that will work for both sides? Keep in mind even if I ceased service of this tool, someone else will re-create it (in-fact there's already multiple people that actively maintain scrapers for fansly); so even for you guys it would be profitable to just stick with this. It's a average case of don't blame the player, blame the game 🫣 |
What about doing an incremental backoff timer based on the retry attempts? Ie, assuming the initial value is 1s for attempt 1, then use 2s for attempt 2, 4s for attempt 3, 8s for attempt 4, etc.? If someone set it to 5s, then it would go 5s/10s/20s/40s/etc. |
Bug Description
For some creators I try to download from, the program fails to recognize posts after the first set it finds. It will also even fail to find the first set of posts if a previous successful run happened recently. It doesn't seem to happen with all creators or with downloads using the
download_mode
=Single
param, however.Expected behavior
All Timeline posts from a creator should download.
Actual behavior
Only the first set of posts found were downloaded.
Environment Information
Additional context
I had issues with the Windows executable so I tried the latest python script to see if it was solved, and it was not. Adding some debug lines to print the request output shows that the first request is successful and the timeline_cursor is correctly updated to the last entry, but the second request still returned with all fields present but empty. Adding an extra delay between each request seems to fix the issue.
The text was updated successfully, but these errors were encountered: