-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect packages being published with typo'ish names #4998
Comments
This is more or less the same as #2268, so I'm going to close this as a duplicate, but thanks for the feature request! |
@aaronlelevier thanks for taking a look at my issue. I'll check the existing issue then. Thanks. |
I'm actually going to reopen this, because I think it would be useful to have this issue (about typosquatting prevention/detection before/during upload) distinct from #2268 (which is about notifications, alerts, a "packages with similar names" widget, etc.). Thanks @aaronlelevier! |
Today I discussed this idea -- checking for typosquatting, pre-upload -- with @dstufft and @ewdurbin. It would be pretty hard to do this without LOTS of false positives. Donald mentioned a person at Netflix whose approach was: remove the dashes from popular project names, register the resulting strings. We could increase the scope of our current normalization rules to cover more scenarios -- there will be existing collisions, including with that preemptive registration project. In any case, this kind of checking ought to be built as part of a pipeline where automated systems run checks, and then flag packages/projects for deletion/review/ok by PyPI admins. |
Per conversation last week: We'll be addressing this problem during upcoming work on automated detection of malicious uploads/typosquatting. First we'll need to develop good tools to detect and flag the pytosquatting/typosquatting, then we'll add tools in that pipeline for PyPI to automatically prevent/reject publication of packages that hit a certain "hey, that looks dodgy" score. |
PR #7377 has been merged. If someone wants to contribute such a malware check, the documentation for how is here: https://warehouse.pypa.io/development/malware-checks/ |
From pypi/support#526 (comment):
|
Are we aware of this paper from March 2020 which investigated typosquatting on PyPI? Or have the authors reached out? I've only skimmed the abstract, and haven't looked at the tool they say they developed there, but it seemed like an interesting read. (I'm finding this issue after seeing #9527). |
What's the problem this feature will solve?
Prevent malicious packages being published with typo'ish names
Describe the solution you'd like
I'd like to propose an algorithm that blocks malicious packages with similar names to well known packages from being published.
Recently there were articles about 12 malicious packages found. Several of them had names very close to Django, and as an avid Django user, this got my attention.
An algorithm could be used that uses Levenshtein distance combined with other input features like number of similar file names, number of similar code lines compared to legitimate packages of a similar name. If there is a close resemblance, then the package could be initially blocked from being published until a human reviews it or permanently blocked.
The algorithm could also be a lot more sophisticated, something such as Android's algorithm that uses machine learning to detect malicious apps and measures over 700+ features I believe.
I am just proposing something of this nature if it hasn't already been proposed.
Additional context
Here is the article link that I am referencing:
https://www.zdnet.com/article/twelve-malicious-python-libraries-found-and-removed-from-pypi/
Thanks,
Aaron
The text was updated successfully, but these errors were encountered: