Ongoing strategies for spam #2982

ewdurbin · 2018-02-19T13:30:38Z

Based on information received from the team behind npm, the spam attackers involved in our latest flurry are sophisticated and relentless.

Indeed our initial round of cleanup included 78 Spam User accounts each operating on its own IP Address.

We've added some functionality to the Admin side of things to stop these in their tracks to give us time to assess, but should develop more operational processes moving forward.

I propose the following approach:

Automated Spam classification for all incoming Projects and Releases

Feed the interesting parts of the uploaded metadata for classification by a spam classification model. This should NOT be something that occurs synchronously during the upload, but rather its results should be stored for review by administrators.

Admin interface for review and training of Spam classification results

PyPI Administrators should have a location to review uploads classified as spam. This should allow for the administrators to report back to the model if a given upload was a false positive. It should also allow for administrators to quickly delete true spam.

Community crowdsourced classification of spam

Allow Logged In Users to report spam found on PyPI. This gives us a view of false negative classification. These reports should be rate-limited in order to prevent abuse.

Admin interface for review of User Spam reports

PyPI Administrators should have a location to review User reports of Spam. This should allow for the administrators to report back to the model if a given upload was a false negative. It should also allow for administrators to quickly delete true spam.

Additionally, it should allow for administrators to mark reports as invalid. We may want to keep track of a "reputation" for reporters as well. Users with consistently high reputation or consistently low reputation for reports can be weighted.

rth · 2018-02-20T23:34:37Z

Thanks for your work on handling this incident!

Automated Spam classification for all incoming Projects and Releases

Not an actual classification, but in this notebook I tried to quickly extract links from package description and match them against a blacklist of domain names to see if this would produce anything useful. It turns out it mostly produces false positives so far. Actual classification should work better...

Community crowdsourced classification of spam

Beyond spam labeling, if you are able to provide some dataset with the metadata of packages that were removed as spam (a dataset of valid packages is easier to come by), I think some people in the Python community might be interested in building an ML classifier to automate the detection. This could give you a second evaluation with respect to any solution you implement internally at PyPi..

brainwane · 2018-03-06T14:33:46Z

For reference: in the Warehouse developers' meeting a few weeks ago we agreed that we'll open a nice-to-have issue for a "report spam" button for logged-in users, once #2991 is finished and merged.

brainwane · 2019-06-21T02:37:33Z

Per conversation today:

Automated Spam classification for all incoming Projects and Releases

Feed the interesting parts of the uploaded metadata for classification by a spam classification model. This should NOT be something that occurs synchronously during the upload, but rather its results should be stored for review by administrators.

Work toward #194 may help this.

Admin interface for review and training of Spam classification results

#6062 and #4011 might help this.

Community crowdsourced classification of spam

Allow Logged In Users to report spam found on PyPI. This gives us a view of false negative classification. These reports should be rate-limited in order to prevent abuse.

#3231 and #3896 would help with this.

Admin interface for review of User Spam reports

Again, #6062 and #4011 might help this, plus #2976 and #3218.

di · 2022-08-12T15:15:24Z

One last thing I'd like to add here: we have some one-off scripts that scan for spammy behaviors. It'd be nice to integrate them into the Admin UI, and have some mechanism to send admins reports, as well as some mechanism for users to mark/report packages as spam/malware.

ewdurbin mentioned this issue Feb 19, 2018

Spam Classification #2991

Closed

brainwane mentioned this issue Feb 21, 2018

Possible spamming of package namespace #2859

Closed

ewdurbin mentioned this issue May 4, 2018

User report mechanism for projects that damage other packages, don't adhere to guidelines, or are malicious #3896

Open

di added the feature request label Mar 21, 2019

brainwane added this to the Package signing & detection/verification milestone Jun 19, 2019

brainwane modified the milestones: Package signing & detection/verification, Post Legacy Shutdown Jun 21, 2019

xmunoz mentioned this issue Apr 26, 2021

User support ticket system psf/fundable-packaging-improvements#34

Open

di added meta Meta issues (rollouts, etc) and removed feature request labels Mar 8, 2022

miketheman added the malware-detection Issues related to automated malware detection. label Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ongoing strategies for spam #2982

Ongoing strategies for spam #2982

ewdurbin commented Feb 19, 2018

rth commented Feb 20, 2018

brainwane commented Mar 6, 2018

brainwane commented Jun 21, 2019

Automated Spam classification for all incoming Projects and Releases

Admin interface for review and training of Spam classification results

Community crowdsourced classification of spam

Admin interface for review of User Spam reports

di commented Aug 12, 2022

Ongoing strategies for spam #2982

Ongoing strategies for spam #2982

Comments

ewdurbin commented Feb 19, 2018

Automated Spam classification for all incoming Projects and Releases

Admin interface for review and training of Spam classification results

Community crowdsourced classification of spam

Admin interface for review of User Spam reports

rth commented Feb 20, 2018

brainwane commented Mar 6, 2018

brainwane commented Jun 21, 2019

Automated Spam classification for all incoming Projects and Releases

Admin interface for review and training of Spam classification results

Community crowdsourced classification of spam

Admin interface for review of User Spam reports

di commented Aug 12, 2022