Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] adopt measures against bots #6226

Closed
iapicca opened this issue Nov 27, 2022 · 24 comments
Closed

[proposal] adopt measures against bots #6226

iapicca opened this issue Nov 27, 2022 · 24 comments

Comments

@iapicca
Copy link

iapicca commented Nov 27, 2022

Use case

Whether this was the intended goal or not,
in my experience the "like" counts is often taken in account
in the process of consider what package to adopt.

Obtaining "likes" by unfair means (possibly bots)
could sway a larger portion of audience towards a given package
regardless its merits.

Proposal

It would be useful to adopt measures to prevent or discriminate unfairly obtained likes

  • show the names of the accounts who like the package
  • stricter sign up procedure (captcha...)
  • hide likes of users that liked only packages of a single author
example

getX looks suspicious to me

Screenshot 2022-11-27 at 11 22 26
@iapicca iapicca changed the title [proposal] adopt measures against fake likes [proposal] adopt measures against bots Nov 27, 2022
@isoos
Copy link
Collaborator

isoos commented Nov 27, 2022

Obtaining "likes" by unfair means (possibly bots) could sway a larger portion of audience towards a given package regardless its merits.

I'm curious: do you see any sign of bot likes?

stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

@iapicca
Copy link
Author

iapicca commented Nov 27, 2022

hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age), [...]

I think this would be a good start.
What about this one?

  • show the names of the accounts who like the package

[...] but in practice I think it is always the X vs 2X vs 10X differences that really count, and in those a few "rogue" like is unlikely to matter.

I think we are getting already above the 5X see below

I'm curious: do you see any sign of bot likes?

example

getX looks suspicious to me

Screenshot 2022-11-27 at 11 22 26

Do you really believe that getX has 5 times the like of bloc or firebase
and 30% more than provider ?
Doesn't this look suspicious at all?

@isoos
Copy link
Collaborator

isoos commented Nov 28, 2022

Do you really believe that getX has 5 times the like of bloc or firebase and 30% more than provider?

Some sampling of the mentioned packages and other popular ones (order by likes):

  • get: ~11k likes, 1.3k forks, 7.8k stars
  • provider: ~7.7k likes, 0.46k forks, 4.5k stars
  • http: ~5.5k likes, 0.3k forks, 0.9k stars
  • flutter_native_splash: ~4.8k likes, 0.14k forks, 0.9 stars
  • dio: ~4.7k likes, 1.3k forks, 11.1k stars
  • firebase_core: ~2.3k likes, 3.6k forks, 7.6k stars (monorepo)
  • bloc: ~2k likes, 3k forks, 9.8k stars

Order by forks: firebase_*, bloc, dio + get, provider, http, flutter_native_splash.
Order by stars: dio, bloc, firebase_*, get, provider, http, flutter_native_splash.

It seems to me that package:get could be very well used (and liked and forked and starred) that much. If there is a bot activity, it is not self-evident based on these numbers (or they also managed to replicate it on GitHub).

(Note: it may not be the same users who like/fork/star the package, and they may do it in a different part of their use/understanding of the ecosystem. E.g. they may have learned Dart/Flutter from a tutorial that used get, and they hit the like button right away. There is no malice in that.)

@iapicca
Copy link
Author

iapicca commented Nov 28, 2022

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

I'm not campaigning on a specific package being or not "boosted" by fake likes (there is already twitter for that),
I agree that maybe the like count of the package I picked as example is totally legitimate
I don't want to talk about that package and focus instead on measures against bots in a generic sense.

I think that the first point should be a default for transparency anyway

  • show the names of the accounts who like the package

I understand that is inconvenient to change the signup procedure... fair enough

  • stricter sign up procedure (captcha...)

We only have Google accounts. Signing up for them is a considerable effort already, and I think Google accounts have much better bot detection and protection than we can implement as a small team.

I think I probably didn't express myself correctly

  • hide likes of users that liked only packages of a single author

I think we could consider weighting likes (e.g. by the pattern or by age)

I have 2 points to make here

  • if an account likes only packages of a specific author is at the very least "bias" if not straight "fake"
  • a like should based on real world experience, not on "sympathy" for a user or organization, it's hard to believe that any single positive experience comes from packages and plugins of the same organization or author

reframing the 3rd point of my proposal:
flag profiles that like only packages of a single author

@isoos
Copy link
Collaborator

isoos commented Nov 28, 2022

show the names of the accounts who like the package
I understand that is inconvenient to change the signup procedure... fair enough

It is not only that, but also a privacy question: we would need an opt-in approval from the users, and the feature wouldn't really provide them much value. If you are worried about bots, they could just not opt-in and you wouldn't be any wiser...

a like should based on real world experience, not on "sympathy" for a user or organization, it's hard to believe that any single positive experience comes from packages and plugins of the same organization or author

I wouldn't try to second-guess the intent behind a like: we only see a click event, and don't know whether it was a quick moment decision or a long-term elaborate one. In an ideal world it would matter, but in practice we have no control or insight into it.

flag profiles that like only packages of a single author

If we implemented it, and anybody cared to setup bot voting, they would realize what's happening, and soon enough look at this thread or figure out this countermeasure from the open source code. They would start to modify the bot to like some other packages too, maybe even randomize a bit in time and activity. We would be in an even worse position.

Another angle: why would we discount likes from users that may be infrequent visitors to the site, and only liked packages from the same publisher? They may have clicked at a time when the publisher was different, but there was some consolidation in development efforts. We need better patterns if we want to take negative measures on this one.

It is not clear to me that there is any ongoing malice with the likes, and until then, I think our limited efforts are better spent on other features and improving the site. In contrast: when there is a spam package being uploaded, we do take steps to remove it (and also prevent further uploads from the same account). But the case must be clear, not just a vague hunch.

@jonasfj
Copy link
Member

jonasfj commented Nov 30, 2022

I feel that we are discussing whether my example is good or not, rather than addressing my feature request.

Yeah, let's avoid discussion of individual cases.

It's not my impression that there is widespread use of bots for likes on pub.dev; nor that this would be an urgent problem.
If it does become an issue, I think we might want to focus the effort on minimizing bot accounts in general.

And I think we should be careful here. It's very hard to see if a package is useful. And being "useful" is very subjective.
Some authors are really good at outreach, tutorials, getting people started. And that might be "useful" to some people.


We do undertake some effort to minimize bots, in particular we block accounts uploading spam. There are other efforts we can undertake, but that will probably not be subject for public discussion.

None of the solutions to mitigate bots are perfect and they all have downsides. Even the best spam filter occasionally throws away legitimate emails. Hence, employing more measures against suspected bots must be weighted against the negative implications of doing so.
So if possible, I'd much rather avoid aggressively employing imperfect bot mitigation systems.

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

@jonasfj jonasfj closed this as completed Nov 30, 2022
@iapicca
Copy link
Author

iapicca commented Nov 30, 2022

I think we should close this for now. We're not planning any action at the moment. And if we need to employ mitigation systems I don't think we can debate them publicly.

I understand that,
thank you both @jonasfj and @isoos for addressing the issue

@iapicca
Copy link
Author

iapicca commented Jun 9, 2023

@jonasfj
I think this could mitigate the issue

@rydmike
Copy link

rydmike commented Mar 10, 2024

This is good, as the comments further above should conclude the debate about artificial "Likes" boosting or their tampering on pub. The statements above basically say there has not been any detection of such tampering.

This is good new information, since I have been hearing about suspected Likes tampering on pub, for at least 4 years in the Flutter community. I always said, if that is the case evidence should be presented, never saw any.
Plus now the statements in this issue make it clear that such tampering has not been detected, so it should also then finally resolve that suspicion and debate.

Thanks this is excellent news 👍

@jonasfj
Copy link
Member

jonasfj commented Mar 11, 2024

@rydmike I don't think we know for certain that "artificial "Likes" boosting" isn't taking place 🤣
But I don't have an impression that it's widespread, or that it affects many packages. I haven't seen any evidence, but I'm also not sure what such evidence would even look like.

Regardless, it would take a non-trivial amount of work to orchestrate many Google bot accounts. I personally think most package authors would get further focusing on writing a good package, with solid documentation, tutorials, videos and such.

@iapicca
Copy link
Author

iapicca commented Oct 13, 2024

[...] I think this could mitigate the issue

@rydmike @jonasfj
I feel that this PR

would indirectly help "sniffing" packages boosted by bots
cc @szakarias

@iapicca
Copy link
Author

iapicca commented Nov 19, 2024

I cross-referenced the like and dl count with the experimental flag
the numbers of getx seem fishy, don't they?

@jonasfj @isoos could you please consider re-opening this issue?
(thank you @szakarias for making this possible)

package likes downloads
bloc 2.9k 2.5M
riverpod 3.4k 2M
provider 10.3k 3.75M
getx 14.8k 604k
BLOC screenshot Screenshot 2024-11-19 at 18 16 52
RIVERPOD screenshot Screenshot 2024-11-19 at 18 16 42
PROVIDER screenshot Screenshot 2024-11-19 at 18 16 27
GETX screenshot Screenshot 2024-11-19 at 18 15 55

cc @jonataslaw

@Tienisto
Copy link

Tienisto commented Nov 19, 2024

I don't use getx and likely never will, but it's difficult to tell if the likes from the getx package are fake given it's natural growth.
I do think that people are more likely to just "like" the package because they are hyped because of the flashy readme.
It's interesting to discuss when and why people like a package since this is an emotional process. Showing the download numbers as seen in the new experimental feature is a good step to provide a more technical tool to rate a package that is free from emotion.

Edit: Another argument against bots in getx is that it is becoming obsolete because the last stable release was 14 months ago. Likely, a lot of developers migrated away from getx, leaving the like count as something historical.

See https://pubstats.dev/packages/get,flutter_riverpod

Bildschirmfoto 2024-11-19 um 18 07 06

@iapicca
Copy link
Author

iapicca commented Nov 19, 2024

I don't use getx and likely never will, but it's difficult to tell if the likes from the getx are fake given it's natural growth. I do think that people are more likely to just "like" the package because they are hyped because of the flashy readme. It's interesting to discuss when and why people like a package since this is an emotional process. Showing the download numbers as seen in the new experimental feature is a good step to provide a more technical tool to rate a package that is free from emotion.

See https://pubstats.dev/packages/get,flutter_riverpod

@Tienisto I think many devs (including in teams I worked with)
used to pick a package over another "also" according to likes
since a "widely adopted" package has in theory more chances to succeed and being maintained longer
(I know it's not always the case, RIP hive)

I think it's not just "hype" and "flashy readme"
but trying to get people (and companies) to invest in a project,
the huge likes/DL discrepancy
could be caused by artificially boost the likes

I just wish this phenomenon to be investigated... that's all I ask

@Tienisto
Copy link

I also put the like count into consideration. Especially, when the popularity metric is kind of abstract (99% vs 100%)
Maybe the like count isn't good in the first place. crates.io, npm, and nuget do not have this metric at all.

@iapicca
Copy link
Author

iapicca commented Nov 19, 2024

Removing "like" feature sounds like a good idea to me

@isoos
Copy link
Collaborator

isoos commented Nov 19, 2024

I'm curious: what do you think of GitHub stars? Because likes is essentially a very similar feature, even the first line of the GitHub documentation says "Starring makes it easy to find a repository or topic again later." and likes here do the same.

I'm not convinced by the data shown here that this is a clearly bot or fraudulent activity. If anything, the referenced data seems to suggest that there is no clear correlation (or rather regression function) between the download and like counts, and with that, we should treat them with separate usefulness (likes being a historical accumulation of goodwill towards the package).

If you have seen video bloggers saying (or begging) "Like and subscribe" you should know that they broadcast it because it works. If a package has an outreach like that, it may just get more likes here.

@iapicca
Copy link
Author

iapicca commented Nov 19, 2024

@isoos if that's the intended use of "like" than I'd rather have it removed
as mentioned above
in real life, real people and real companies used to refer to the like count
if it is intended to represent a "social feature" rather than a quality indicator
then I don't see much the value of it

I'm not convinced by the data shown here that this is a clearly bot or fraudulent activity.

that makes one of us

[...] we should treat them with separate usefulness

I completely agree, what about make it clear to the package adopters?

@jonataslaw
Copy link

While I appreciate the points raised, I believe it's important to emphasize that no single metric can provide a comprehensive view of a package's popularity or quality. Metrics like likes, downloads, GitHub stars, and the number of open-source projects using a package all offer valuable insights when considered together.

That said, the comment by @iapicca seems to reflect a strong personal preference rather than an objective assessment of the available data. For instance, while download/likes counts have their limitations, they still hold relevance when combined with other indicators, such as:

  • GitHub stars: These showcase community interest and approval.
  • Open-source adoption: The number of projects actively using a package demonstrates its real-world applicability.

Take, for example, a comparison between Riverpod and GetX:

  • Riverpod has 67,000 projects depending on it on github.
  • In contrast, GetX is used by over 198,000 open-source projects, highlighting its extensive adoption.

This data demonstrates that analyzing multiple metrics can provide a more nuanced and balanced view, rather than relying on a single parameter.

For these reasons, I encourage focusing on a holistic evaluation rather than dismissing certain metrics outright. I hope this perspective fosters a more constructive discussion moving forward.

@bigbott
Copy link

bigbott commented Nov 22, 2024

GetX has fewer downloads because it is less popular among enterprises, and the number of downloads is not affected by automated builds.

There is another metric available on GitHub -- the number of repositories that use a particular repository. By this number, GetX is comparable with BLoC and Riverpod.

People use GetX because it is simple and has a lot of shortcuts.

Software development is the art of balancing KISS and SOLID, and for some people (including myself), GetX just gets it right.

If you want your app more SOLID, it can be done with or without GetX, but, please, stop being so emotional about software frameworks that you personally don't use and don't know.

@yang-lile
Copy link

  • Riverpod has 67,000 projects depending on it on github.Riverpod 在 github 上有67,000 个依赖它的项目
  • In contrast, GetX is used by over 198,000 open-source projects, highlighting its extensive adoption.相比之下,GetX 被超过198,000 个开源项目使用,凸显了它的广泛采用。

Where does this data come from? @jonataslaw

@bigbott
Copy link

bigbott commented Nov 22, 2024

Where does this data come from?

github repository page on the right side @yang-lile

@yang-lile
Copy link

So, bloc is referenced by over 200,000 open source projects, but you don't mention it? @jonataslaw . And we all know that riverpod is younger.

@jonasfj
Copy link
Member

jonasfj commented Nov 22, 2024

To be clear, it's not exactly impossible to fake the download count 🙈 🙈 🙈

Certainly packages often used by app developers who have active CI systems running are going to have HUGE download count boost.


I think that in general, we should be extremely careful to derive anything from a huge download count or large number of likes.

I'm actually not sure it says much whether a package has 8M downloads or 80k downloads. All it tells us is that there is a non-trivial number of active users or a non-trivial amount of activity from a sizable set of active users.

It's sort of the same with likes.

I think the numbers only speaks volume when they are very low. That said, there are lots of quality packages with few downloads and few likes.


If we want good signals of quality I think we do have one: https://docs.flutter.dev/packages-and-plugins/favorites

Of course, it's not easy to scale flutter favorites to cover all high quality packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants