Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving off lockfile #4766

Closed
pradyunsg opened this issue Oct 4, 2017 · 18 comments · Fixed by #7023
Closed

Moving off lockfile #4766

pradyunsg opened this issue Oct 4, 2017 · 18 comments · Fixed by #7023
Labels
auto-locked Outdated issues that have been locked by automation help wanted For requesting inputs from other members of the community project: vendored dependency Related to a vendored dependency type: maintenance Related to Development and Maintenance Processes

Comments

@pradyunsg
Copy link
Member

pip currently vendors lockfile but it is deprecated as on Oct 2017.

pip should start using some other package -- lockfile's README suggests fasteners.

@pradyunsg
Copy link
Member Author

There's https://pypi.org/project/filelock, which has a very similar API and seems to be active.

@pradyunsg pradyunsg changed the title Move off of lockfile Moving off lockfile Dec 16, 2017
@pradyunsg pradyunsg added this to the Internal Cleansing milestone Aug 12, 2018
@uranusjr
Copy link
Member

Another possibility: https://pypi.org/project/portalocker/

@chrahunt
Copy link
Member

chrahunt commented Jul 14, 2019

Just a few considerations:

  1. Backwards compatibility. Any change in this area would not guard against concurrent access by different pip versions, unless the existing behavior is also preserved. Is there any way around this?
  2. We must refactor to lock using a different file than we write into. This is in general a good practice, but absolutely required if the underlying library is using fcntl.lockf that depends on the "Advisory Record Locking" described in fcntl(2). This for example is unsafe, since opening and closing the same file would release the lock. I know this applies to fasteners, and filelock explicitly states that users should not use the same file as a lockfile.
  3. NFS compatibility. I don't think it's uncommon to have home directories shared across multiple hosts via NFS, and the default cache dir on Unixes is ~/.cache.

@pradyunsg pradyunsg added the help wanted For requesting inputs from other members of the community label Aug 6, 2019
@pradyunsg
Copy link
Member Author

@chrahunt Would you be able to make some time to work on this?

@chrahunt
Copy link
Member

chrahunt commented Aug 7, 2019

Sure. To start, the current situation is:

  1. lockfile is deprecated and several issues have been raised on this issue tracker related to it
  2. pip internally uses lockfile to guard its version check file
  3. cachecontrol, which we use for caching https responses, uses lockfile - specifically as an optional dependency when using FileCache. Replacing lockfile is tracked in 'lockfile' is deprecated. psf/cachecontrol#109 with a pending PR [WIP/PoC] Attempt to use the fasteners library psf/cachecontrol#114.

@chrahunt
Copy link
Member

chrahunt commented Aug 7, 2019

I think in both of these situations we can probably avoid using file locking altogether, in favor of creating and writing to a unique file (using e.g. uuid.uuid4()), removing the target file if it exists, then os.renameing the unique file to the target file, tracing a warning on any errors but otherwise continuing.

Pros:

  • much less complicated
  • backwards compatible with previous pip versions (On Unix any pip writing to a file that is removed will still be writing to the old file, on Windows the deletion will fail but we can simply warn and continue)
  • since currently only writes are guarded and not reads, technically another process trying to read the file as it's being written may read a partial file - this approach would avoid that

Cons:

  • Because FileCache tries to import lockfile in its constructor I think we should not derive from that class but have our own implementation which will be code in our codebase instead of cachecontrol.
  • Does not improve the situation for anyone else (but we could suggest this approach if appropriate)

@adamchainz
Copy link

Could also move the data to SQLite which guarantees transactional semantics on all platforms. Though maybe Pip tries to work in environments where Python has been compiled without SQLite?

@chrahunt
Copy link
Member

That's a good point. We would probably want to avoid dependencies on optional modules as much as possible, otherwise we end up raising the bar for a semi-functional Python installation. It could also be optional, but if we have a fallback then we're maintaining two approaches.

@pradyunsg
Copy link
Member Author

We would probably want to avoid dependencies on optional modules as much as possible

Yep -- Ideally, pip only depends on things it knows will be on end user systems. pip should work on the bare-bones CPython IMO.

@uranusjr
Copy link
Member

Does not improve the situation for anyone else (but we could suggest this approach if appropriate)

What does “anyone else” mean here? Other users of lockfile? I am not quite following this particular point.

@chrahunt
Copy link
Member

Sorry, other cachecontrol users.

@uranusjr
Copy link
Member

I see, that makes much more sense.

In that case I’d say a double barrel approach would work: pip can implement its own logic (or maybe maintain a small library) to replace lockfile, so it can have the best possible compatibility. The SQLite approach can be pursued by cachecontrol if they so inclined.

@pradyunsg
Copy link
Member Author

Before rolling out our own library, I'd suggest to look and check if there's any existing library that has this functionality.

I'm not against a dedicated library for this but I don't want us redoing work that's already been done.

@pfmoore
Copy link
Member

pfmoore commented Aug 12, 2019

pip can implement its own logic (or maybe maintain a small library)

I'd rather split it out into a separate library that we then vendor back in. I don't see any point in making functionality like that internal to pip, and conversely if we split it out, we can hopefully find other contributors to help maintain it.

But as @pradyunsg says, let's make sure we're not reinventing the wheel first. There's been 3 libraries mentioned so far in this thread, and presumably all other users of lockfile have had to face this question. Pip has some special constraints, but I'd be surprised if we were so special that none of the other options work for us...

@chrahunt
Copy link
Member

To give us something concrete to discuss over, I have submitted #6879 which uses the strategy described here. It depends on another pending PR so it contains several commits unrelated to this issue but you can see the specific relevant commit here.

As stated above - instead of

with lock:
    with open(path, 'w') as f:
        f.write(text)

we do

with open(tmpfile, 'w') as f:
    f.write(text)

replace(tmpfile, path)

which should be safer than the previous approach. We can do the same thing for the HTTP cache without much more work.

@pradyunsg
Copy link
Member Author

Let's discuss over at #6879.

@chrahunt
Copy link
Member

All done. :)

@pradyunsg
Copy link
Member Author

Hurrah! Thanks for all your work on this @chrahunt!

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 19, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation help wanted For requesting inputs from other members of the community project: vendored dependency Related to a vendored dependency type: maintenance Related to Development and Maintenance Processes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants