Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Automatically uninstall malicious packages taken down from PyPI #5777

Open
di opened this issue Sep 12, 2018 · 20 comments
Open
Labels
type: feature request Request for a new feature type: security Has potential security implications

Comments

@di
Copy link
Member

di commented Sep 12, 2018

What's the problem this feature will solve?
PyPI occasionally gets malicious packages uploaded to it. PyPI administrators remove the packages as quickly as possible, but sometimes users still install these packages before they are taken down, and the packages remain in the user's environment.

Describe the solution you'd like
At runtime, pip queries PyPI for a list of malicious packages that have been taken down from PyPI:

  • if it doesn't find any of them in the local environment, it does nothing;
  • if it finds a malicious package has been installed, it uninstalls it automatically.

Additional context
The necessary API doesn't currently exist on PyPI, but if this feature is accepted, it would be trivial to implement.

@pfmoore
Copy link
Member

pfmoore commented Sep 12, 2018

My immediate thought is that I'd prefer not to silently uninstall anything, but rather to let the user know what's happened and ask permission to uninstall the malicious software.

Also, how would we confirm that package FOO on the user's PC is actually the malicious FOO from PyPI and not (say) some entirely local package that they developed themselves and isn't on PyPI?

@di
Copy link
Member Author

di commented Sep 12, 2018

My immediate thought is that I'd prefer not to silently uninstall anything, but rather to let the user know what's happened and ask permission to uninstall the malicious software.

Agreed, this should definitely make it known to the user that they had malicious software installed, in case they need to take further steps to mitigate the problem.

Also, how would we confirm that package FOO on the user's PC is actually the malicious FOO from PyPI and not (say) some entirely local package that they developed themselves and isn't on PyPI?

I think this is unlikely (most malicious packages are typo squats on real packages) but possible and would need to be addressed in some way.

If the uninstall isn't happening automatically, then the prompt could be a one-time thing: if the user decides to leave the package installed, pip won't warn about it again.

@pradyunsg pradyunsg added type: security Has potential security implications type: feature request Request for a new feature labels Sep 18, 2018
@pradyunsg
Copy link
Member

@dstufft Thoughts?

@hugovk
Copy link
Contributor

hugovk commented Sep 18, 2018

Some hypothetical questions:

  1. Once a malicious package has been removed from PyPI, is that name forever flagged as bad?

  2. Or can it be at some point flagged as safe? A use case could be a typo-squatted name is given to the rightful owner.

  3. If it is marked as safe, would pip then stop asking to uninstall?

  4. And how about if the user had installed the malicious version, but now the name is marked good, would pip uninstall the bad one?

@di
Copy link
Member Author

di commented Sep 18, 2018

Answers:

  1. Once a malicious package has been removed from PyPI, is that name forever flagged as bad?

Yes.

  1. Or can it be at some point flagged as safe? A use case could be a typo-squatted name is given to the rightful owner.

Nope, it is permanently unavailable. We don't release typo squats to the "proper" owners. It would be a pain to manage 1 real package plus 5-10 typos of the name simultaneously. It's easier if the typo just never works.

  1. If it is marked as safe, would pip then stop asking to uninstall?

They won't be marked as "safe".

  1. And how about if the user had installed the malicious version, but now the name is marked good, would pip uninstall the bad one?

See above.

@RonnyPfannschmidt
Copy link
Contributor

@pradyunsg how about having pip check print them?

btw, how big is the list currently, im wondering if it would be reasonable to just download it compressed

@di
Copy link
Member Author

di commented Sep 19, 2018

@RonnyPfannschmidt About 200 project names.

(To be clear, I'm suggesting that the hypothetical API would be a single endpoint that returns all "bad" project names)

@RonnyPfannschmidt
Copy link
Contributor

@di i beleive its reasonable to provide a .json.bz2 with all those and to download it in a cachable manner

@di
Copy link
Member Author

di commented Sep 19, 2018

I don't really think it even needs to be compressed. It doesn't change very often, as long as pip can conditionally GET it, it should be fine as just JSON.

@RonnyPfannschmidt
Copy link
Contributor

@di im simply going to assume its going to grow to thousands of text entries in the years to come ^^ - but a transfer-encoding may be enough to safe those bytes

@dstufft
Copy link
Member

dstufft commented Sep 19, 2018

I'm thing that actively malicious packages are a special case of the general case of "packages with security issues". After all, there is not a lot of difference between a good package that accidentally allows something malicious to happen and a bad package that purposely allows that same thing to happen-- in both cases the bad thing happens.

So with that in mind, I think a far better framework is something like what npm has implemented in npm audit, which is effectively a generic listing of versions of software that has security issues, that people can run against their code base to get a report. It also has a npm audit fix, which will attempt any automatic remediation that can occur (in this case, uninstalling the malicious package).

The generic thing is a bit more work, but I think it is far far more useful that a one off feature.

@pfmoore
Copy link
Member

pfmoore commented Sep 19, 2018

There's a need for care here on the server side (I say "server" rather than "PyPI" - see below for why). Once we start extending the reasons why we'd blacklist packages, we risk getting into a position of becoming curators, and PyPI as a curated system is a whole different thing. Having said that, (a) that's a problem for PyPI to wrestle with, not for pip, and (b) I'm not trying to suggest that "having a security flaw" is something we need to debate over.

I agree that having an audit/fix solution rather than an automatic removal is better. Sure, there's a risk that someone doesn't audit their system, but the consenting adults principle applies here. I do not want pip to try to make it so users don't have to think about issues like this, we should give them the tools and the information, but their choices are their own to make.

The question still remains (I asked it above and @di noted it but said he thought it would be rare) which is that for this to work, we need to track where packages come from. We can't really have PyPI being the authority that says "this name is forbidden". Consider as an example a package that has a security vulnerability and gets blacklisted. A company needs the functionality in that package, and creates a fixed version which they host on their local package index. Just because PyPI says that package is blacklisted can't be a reason for blacklisting the local version. So we need to be able to say "did this installed package come from the index that is reporting it as blacklisted?"

Also, should we allow other indexes to publish blacklists? If not, why not? At a minimum we'd have to allow testpypi to do so, so people can test things (which reminds me, who maintains the blacklist - will we have some "fake" blacklist items set up for testing?). And why not test things locally? But if we do, people will try to use it for broader reasons than revoking malicious packages. Having it not affect packages sourced from anywhere other than the index the blacklist came from reduces the scope creep here dramatically (consider someone trying to publish a local blacklist of GPL packages, because their corporate licensing doesn't let them use GPL code - if we don't let a local blacklist stop PyPI packages being used, we can avoid having to think about the implications of scenarios like that because it simply won't work). Limiting the blacklist feature to "only PyPI" avoids a lot of this complexity of course (but introduces the "how do we test the feature" question...)

@dstufft
Copy link
Member

dstufft commented Sep 19, 2018

There's a need for care here on the server side (I say "server" rather than "PyPI" - see below for why). Once we start extending the reasons why we'd blacklist packages, we risk getting into a position of becoming curators, and PyPI as a curated system is a whole different thing. Having said that, (a) that's a problem for PyPI to wrestle with, not for pip, and (b) I'm not trying to suggest that "having a security flaw" is something we need to debate over.

Given there's a server side and a client side here, and IMO we should have this standardized, we probably should at a minimum discuss this on distutils-sig, if not produce a PEP for it.

Sure, there's a risk that someone doesn't audit their system, but the consenting adults principle applies here.

We could even automatically run a pip audit on install, and print out a message like "hey we detected 7 security issues, run pip audit for more information. That doesn't automatically do anything, but it does provide information as part of the install that will hopefully lead people to investigate more.

The question still remains (I asked it above and @di noted it but said he thought it would be rare) which is that for this to work, we need to track where packages come from. We can't really have PyPI being the authority that says "this name is forbidden". Consider as an example a package that has a security vulnerability and gets blacklisted. A company needs the functionality in that package, and creates a fixed version which they host on their local package index. Just because PyPI says that package is blacklisted can't be a reason for blacklisting the local version. So we need to be able to say "did this installed package come from the index that is reporting it as blacklisted?"

Yea, for this to work we will need to start tracking the provenance of packages, which isn't really a big deal I don't think, it'd just be more metadata in the installation DB to specify the repository that it came from. Alternatively we could track a unique hash of the sdist or something, and tag vulnerability reports to specific hashes. There are a few ways we could take it, and that will largely depend on the design of the server side API, but 100% agree that the feature needs scoped more specifically than "anything named foo is bad".

Also, should we allow other indexes to publish blacklists?

Yes. PyPI should not be special other than the fact it's the default.

@pfalcon

This comment has been minimized.

@brainwane
Copy link
Contributor

We now have some more features on the Warehouse side that are relevant here - we're about to get yanking, we have the start of some malware detection, and there's an event log as the foundation of notifications. So this may be more possible soon, once information's available in the Warehouse API to read.

@brainwane
Copy link
Contributor

Now that PEP 592 is accepted and implemented pypi/warehouse#5837, if you are interested in working on this feature, take a look at the yanking feature and the "yanked" field in PyPI's JSON API.

@di
Copy link
Member Author

di commented Apr 23, 2020

To be clear: yanked releases should not be considered malicious releases and should not be automatically uninstalled. If PyPI starts exposing packages removed for being malicious/typosquats, it'd be via an entirely new API, not the existing project/release JSON API (since the project/releases won't exist anymore once they're removed).

@brainwane
Copy link
Contributor

Whoops. Sorry for the error and thanks for the correction.

@AkechiShiro
Copy link

There was no new progress since 2020 on this interesting feature ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request Request for a new feature type: security Has potential security implications
Projects
None yet
Development

No branches or pull requests

9 participants