-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Feature Request): Dislike prediction through view and like ratio #99
Comments
The prediction will be really inaccurate |
That's the same technique used in this extension. Check the implementation: |
That's using something called |
Fair enough. I agree with you in that case it'll be inaccurate. |
Please see my comment in another thread where I explore the possibility of using different video metrics to estimate dislikes as opposed to a one size fits all equation, which may or may not be accurate. It's possible that even using a dynamic model the estimation will still be inaccurate, but it might be the best shot and get close enough to perhaps warn end users that a video is suspect of having a high number of dislikes. |
I've been gathering a dataset of over 1 million youtube videos' dislike/like ratios, if anybody would like to use this to predict dislikes based on this benchmark let me know and I'll send it! |
From my data, it shows that predicting the dislike ratio from the like/view ratio isn't very accurate; BUT; it's still a decent metric. The best correlation I could get between the two is about .45 (when using logs and stuff), and videos with below a 70% like/(like+dislike) ratio could usually be detected from the like/view ratio. |
It would be nice to know if the ratio gets better when only applying it to videos with a certain number of views. I am thinking of only using the ratio on videos with lets say 30.000 or more clicks since at some point the mathematical "Law Of Large Numbers" comes into play making the like count more reliable. Also the result might get better when taking into account the comments (I guess one doesn't need to save the comments since they are still there one only needs to save the date when the likes/dislikes where saved and then you know if the comment was posted before or after). But this might also make the result worse, because comments can mean approval or disapproval and putting bad data in the prediction would make it worse. A more sophisticated solution could use machine learning to use as many data points as possible. The ML program could look for key words like "love it", "awesome", "interesting" in the comments and maybe finds other data points which affect the like/dislike ratio like video length or key words in the headline, like "Corona"-videos seem to get a lot of dislikes on YouTube lately. |
@RyannDaGreat Your initial look is promising that at least something can be done reliably! I'd be interested in having a look at this dataset, if you could share that I'd greatly appreciate it!
@ChristophGeske I was thinking along these lines as well, that there may be ranges of views which have better correlation than others to use ratios, and some view ranges where it's just not possible, and another method would need to be found.
I'm personally hesitant to spend to much time on this route. Many forum commenters seem to like the idea of someone commenting "dislike" and having people "like" that comment to show dislikes. These comments could be deleted I would imagine, and their use might not be consistent enough. It also means that there may be a huge shift in how users comment on the site, meaning any model training done now would not hold up well over time as users shift how they comment on videos. Additionally, as there would not be an easy way to monitor the deletion of negative comments, a "like estimator" based on comments could be manipulated by those who control deletion of negative comments. That reason alone makes me nervous to make a model reliant on comments at all. |
@tvelk Sure! I'll get a google drive link posted soon...or perhaps a github repo. I'll put it in this thread once I do. And yes-your hypothesis is right, I actually only included videos with over 1000 likes+dislikes in the analysis because anything less than that was a really weird distribution (it looks like a really strange shape in a scatterplot). I'll post my results soon |
but in someway, i think it could be mislead "The Matrix Awakens: An Unreal Engine 5 Experience" by Unreal Engine - https://www.youtube.com/watch?v=WU0gvPcc3jQ maybe, it needs tune little bit by not just likes and view count data, but other data. Such as video category, tags, and where the video came from.. |
Extension or Userscript?
Extension
Request or suggest a new feature!
This extension could stay for now. But it shouldn't relying on only one source forever.
We still had another data even after dislike data actually disappear from Youtube API that could be used as predicting dislike count by using view count and like count data.
Ways to implement this!
Using view number and dislike number ratio as dislike prediction
Can you work on this?
The text was updated successfully, but these errors were encountered: