Add machine readable download progress option #12084

joeyballentine · 2023-06-13T02:53:01Z

Closes #11508 and obsoletes #11172

TLDR: If you call pip in a subprocess, it is impossible to read progress output because rich disables progress bar outputs in subprocesses by default. I had previously added an option to force the progress bar, but it was suggested to add a machine-readable option instead. This PR adds a new progress_bar option that just prints the download progress to stdout and bypasses rich entirely, allowing a program that controls pip to read stdout and parse download progress.

I'm open to suggestions on how this should be formatted. Here is how it currently looks:

joeyballentine · 2023-06-13T04:52:04Z

I don't think that windows CI failing is my fault. Not sure what's going on there

RunDevelopment · 2023-06-13T10:03:36Z

I think using a common data format for the progress data would be good. Right now, devs would have to implement a custom parser. This is both tedious and error-prone. Using a common data format also means that we can more easily change the output in the future, because we don't have to worry about breaking any custom parsers.

I would propose using JSON. Example:

PROGRESS:{"file":"name.whl","current":1234,"total":87654321}

This would allow devs to simply look for the "PROGRESS:" prefix and then use any old JSON parser to get the data. We would have to guarantee that our JSON doesn't span multiple lines (making it closer to JSONL), but I think that's okay.

I also think that the --progress-bar option for this should be json instead of the generic machine-readable.

Another idea would be to support arbitrary format strings, similar to format strings supported by git. Users could say --progress-bar=format:<fmt-str>.

While flexible, the issue with this approach is documentation. We would have to document all features and placeholders of the format syntax.

pfmoore · 2023-06-13T10:45:58Z

Another idea would be to support arbitrary format strings

Let's not over-engineer this. There's been (as far as I know) only one person (the OP here) who has ever requested this. It doesn't warrant a complex solution with all of the associated maintenance overheads.

As I said here I still sort of feel this is something that we should look to rich to support - at the very least, it's worth asking them if it's something they would consider.

joeyballentine · 2023-06-13T13:17:17Z

I've opened a discussion in the rich repo here: Textualize/rich#3000

I know I'm the only person in the world that wants this, but this is currently blocking me again and it's really frustrating to be unable to do something that on the surface seems so simple

RunDevelopment · 2023-06-13T13:36:09Z

It doesn't warrant a complex solution with all of the associated maintenance overheads.

True.

I still sort of feel this is something that we should look to rich to support

I agree and disagree.

I agree in the sense that rich should have an option to force terminal mode. In fact, it does have that option, but it's only accessible to library users (here pip) but not to end users (there is no env variable we could set). See #11172 for that.

However, even if rich supported this (via some mechanism), it wouldn't be with the explicit purpose of being machine-readable. AFAIK, rich doesn't guarantee a specific, strictly-defined text format for its progress output. So rich may choose to change the progress format at some point and thereby break our code. This is far from ideal.

Frankly, I don't think it's rich's job to provide an API for progress.

joeyballentine · 2023-06-13T13:44:49Z

Response on the rich discussion:

I'd certainly consider it. If feels like there should be hooks for this, and it should be left up to the developer to decide on the format.

Sounds like regardless, pip will have to do some implementation itself

joeyballentine · 2023-06-13T13:48:56Z

updated json version:

joeyballentine · 2023-06-13T17:02:36Z

Alternatively if you guys don't want to merge this, is there a recommended way I could import pip's internals to either manually invoke the dependency resolver/downloader and/or hook into it to do this same kind of thing just for myself?

I know you don't recommend people use pip's internals and would rather people invoke it via CLI in a subprocess but that's just out of the question for me without this change.

pfmoore · 2023-06-13T18:11:51Z

No, there isn't any sort of recommended way or using pip's internals. You're basically on your own - you can do it, but it's not supported and you'll have to work out how on your own.

To be clear, I'm not against this - I'm mostly just rather indifferent to it. The code looks clean, and simple enough. It needs tests, and it needs documentation that describes exactly what the output format is (as it'll be a commitment we'll have to support from now on). It also probably shouldn't use print() but should integrate with our logging machinery somehow. Or maybe it shouldn't? I honestly don't know. But I don't want to suggest that if you add those, I'll be any more in favour - it's not that which makes the difference for me, it's simply that it really doesn't seem that useful to be able to read the progress like this. It's still only download progress, there's nothing equivalent to track build progress or time to resolve, etc - so I don't personally see that it's much more useful than a "still working..." spinner. But that's just me, I guess.

joeyballentine · 2023-06-13T18:29:09Z

This really just comes down to large downloads seeming to do nothing. I don't want my users with slow internet connection seeing nothing happen for 5 hours while pytorch's 3GB wheel downloads, then giving up because to them it's just frozen. Even with fast internet it still takes quite some time, and there's no way with just running pip in a subprocess to easily track this.

You'd probably cringe if you saw the disaster that is my current workaround for this. I basically start a pip install, regex match the wheel or URL pip logs out that it's downloading, cancel the pip install, then use the pypi API to grab a list of all the wheels, then i find the one that matches the one pip said it was downloading, then i download that myself (so i can track progress) then i tell pip to install that wheel. Separately i also attempted specifying my own download directory and using file watchers to track download progress, but that was less reliable.

Right now I'm rewriting my code so that the backend handles the pip installs itself (because as-is the frontend can't invoke pip installs on remote backends) and now I'm either tasked with reimplementing that same workaround in python or doing something better (which I'd much rather do).

Honestly it's starting to look like the most painless solution is just for me to include a build of my forked pip with my program and just use that instead of trying to integrate this officially.

I'm willing to do the documentation and whatnot required to get this PR in, but I'm just getting the feeling that it isn't worth it, especially since this will probably take time to be released anyway and I'm trying to do this rewrite right now.

RunDevelopment · 2023-06-13T19:21:12Z

so I don't personally see that it's much more useful than a "still working..." spinner.

Joey isn't exaggerating. In my hometown, downloading PyTorch takes 2 hours. Simply showing "Still downloading" with no progress indication and not even the total download size is not viable.

pfmoore

Also, as noted in the discussion, this needs tests and documentation.

For documentation, I mean a section in the user guide explaining the intended use of this feature, the format of the output data, and how to detect it. It also needs to discuss the fact that (apart from the output this feature adds) pip doesn't typically guarantee that its output is stable, and so any code parsing pip's output is inherently tied to particular pip versions.

src/pip/_internal/cli/progress_bars.py

src/pip/_internal/network/download.py

pfmoore · 2023-06-13T20:47:24Z

Thanks - that sucks, it's easy for me with a fast connection to forget not everyone has that advantage.

You've persuaded me. As long as none of the other pip committers objects, we get the docs and tests sorted, I'm willing to support this PR. I've added a review.

src/pip/_internal/cli/progress_bars.py

docs/html/user_guide.rst

src/pip/_internal/cli/progress_bars.py

pfmoore · 2023-06-14T14:42:36Z

Gonna be honest I'm not sure what to do for the tests here. It seems like to get it to actually do anything progress bar related I have to download a real package, which probably isn't ideal. Also, when it works, it heavily spams the test output. Also, I've realized I can't properly test the error case since the tests run pip in a subprocess.

I think you'd be better trying to unit-test this, with monkeypatching to control whether the code detects the stdout as a console or not. I'm definitely not comfortable with adding a download of a large external package to the test suite, the tests are already way too slow, and deliberately adding a test which is deliberately slow just to trigger a feature that's in itself fast isn't really acceptable.

src/pip/_internal/cli/progress_bars.py

joeyballentine · 2023-06-15T05:18:05Z

In regards to the rate limiting: While looking at the spinners.py file I noticed that it already implements rate limiting via a nice little helper class, so I decided to just reuse that here since it is already the way pip is intended to function. The default value for timing here is also reused.

As for using sys.stdout.write + sys.stdout.flush, it just doesn't work. What does work though, is using print with flush=True. I can't personally explain why, but sys.stdout.write seems to dislike subprocesses the same way rich does. Using the -u flag makes it work. Sorry I didn't see that part until after I pushed this.

I also decided to change the format slightly. instead of being all caps, progress now gets displayed with Progress: {... to more closely match pip's other output.

As for unit tests, I still have to work on those.

joeyballentine · 2023-06-16T00:00:34Z

I'm not sure I'm going to be able to get tests for this working. I got the test suite running locally and messed around with one a bunch that was supposed to capture the error state (warning the user that json mode is not for TTYs) but it appeared to only ever log the "Downloading" text part. Based on some logging I did, I'm pretty sure the reason is that it's triggering the machine readable progress code (isatty() returns False so that error isn't being hit) and it's not capturing it either because we write to stdout directly instead of using the logger, or because it's not doing a real download.

I'd appreciate any suggestions, as I'm not really sure what to do about this.

pfmoore

As regards tests, I'm honestly not sure. Yes, this will be hard to test, which is unfortunate, but that's true of a lot of pip features. I'd start with a unit test of pip._internal.network.download._prepare_download, that passed a dummy Response object that behaved like a big, slow, download, and then iterate over the returned chunks, capturing the data pip writes to stdout and confirming that it contains the data you expect. You can do this both for dummy responses that have a length, and ones that don't. For the "is this a TTY" check, unit test get_download_progress_renderer and monkeypatch sys.stdout.isatty to give the result you want. Then check you get the expected return value.

Once you have those two, you should have a good enough feel for how to test this that you can add whatever else feels necessary (if anything).

docs/html/user_guide.rst

src/pip/_internal/cli/progress_bars.py

joeyballentine · 2023-06-24T01:37:19Z

Sorry for the delay. In the time since the last reply I actually worked on my planned implementation using a build of my pip fork. It actually didn't even need the -u flag to work as expected, so I'm not sure what the discrepancy between that and my previous tests was.

Anyway, I'm working on the suggested changes now. The tests still might take me a while to get right, but I'll give them another go.

…machine-readable-progress

arenasys · 2024-03-19T03:46:05Z

This would be useful to me also.

Limiting to non-tty is an unnecessary complication. I cant think of a situation where the caller explicitly asks for --progress-bar=json and is not wanting JSON progress outputs.

joeyballentine · 2024-03-19T11:36:53Z

@arenasys Yeah, I agree with that.

Unfortunately, I've given up on this PR. I just ended up bundling my pip fork with these changes with my project and have been happily using this feature myself for the past year.

After some time, I now think this (machine parseable progress) is something that should really be done thoughout all of pip and not just for the download progress bar. I don't want to personally do that though as this codebase is massive and confusing. And, I couldn't figure out how to write the tests for just this feature, so I don't think i'd be able to do it for the whole codebase.

arenasys · 2024-03-19T20:11:56Z

Progress reports are the one thing completely missing from pip's output (through pipes), so its very user unfriendly for cases like GUIs using pip and wanting to show the status to the user. As for testing, I would think it should have the same type of testing as the current spinners and progress bars (is there any?). I'll open another PR.

joeyballentine · 2024-03-19T20:28:07Z

I don't think there are any tests for that as the progress bars are all handled by "rich" (the progress bar package)

And for what I was saying about the machine readable progress everywhere, I mainly just meant that I dislike the inconsistency. In my program, I had to have multiple different ways of parsing messages depending on if it was a standard pip message or a download json progress output. And pip just doesn't have install progress at all either (like it just says "installing packages..." and that's it)

I agree this is probably the most important thing though, since it straight up doesn't output anything otherwise. I just think as a general feature I'd prefer an entire overhaul of how pip logs to make parsing overall way simpler and more useful

joeyballentine · 2024-03-27T05:00:10Z

Closing as #12586 implements this better.

joeyballentine added 2 commits June 12, 2023 22:47

Add machine readable download progress option

d7a76d3

Adds news file

13c0431

psf-chronographer bot added the bot:chronographer:provided label Jun 13, 2023

joeyballentine added 4 commits June 12, 2023 23:05

Fix mypy error

0045fd8

lint

5f7ab61

line length

5127329

add newline

813aac6

use json

98742ea

joeyballentine added 2 commits June 13, 2023 09:51

fix import sorting

9080d82

format

70ae242

pfmoore requested changes Jun 13, 2023

View reviewed changes

joeyballentine added 7 commits June 13, 2023 17:28

PR suggestions

13c11ee

use logger

d0d96d9

add docs to user guide

f4ca354

add test

dfc4d98

fix test, maybe

d1d6a69

maybe this time

5f7782a

attempting just using a network package

71f3444

RunDevelopment reviewed Jun 14, 2023

View reviewed changes

src/pip/_internal/cli/progress_bars.py Outdated Show resolved Hide resolved

docs/html/user_guide.rst Show resolved Hide resolved

pfmoore requested changes Jun 14, 2023

View reviewed changes

docs/html/user_guide.rst Show resolved Hide resolved

src/pip/_internal/cli/progress_bars.py Outdated Show resolved Hide resolved

pfmoore reviewed Jun 14, 2023

View reviewed changes

src/pip/_internal/cli/progress_bars.py Outdated Show resolved Hide resolved

PR suggestions

7567e2a

joeyballentine added 3 commits June 15, 2023 01:20

I didn't see the comment about using -u

c0fbf13

lint + add a test (that may or may not work)

31dee38

remove bad test

d73336b

Merge branch 'main' into machine-readable-progress

a3d3ebf

joeyballentine requested a review from pfmoore June 17, 2023 20:59

pfmoore requested changes Jun 19, 2023

View reviewed changes

docs/html/user_guide.rst Outdated Show resolved Hide resolved

docs/html/user_guide.rst Outdated Show resolved Hide resolved

src/pip/_internal/cli/progress_bars.py Outdated Show resolved Hide resolved

joeyballentine and others added 6 commits June 23, 2023 21:39

Update with more realistic example

15f0c8a

update null verbiage

23f95d2

error suggestion

fd16cbc

Merge remote-tracking branch 'origin/machine-readable-progress' into …

e88444e

…machine-readable-progress

ruff

d81de49

Merge branch 'main' into machine-readable-progress

0d6bfc3

pradyunsg removed the bot:chronographer:provided label Dec 20, 2023

psf-chronographer bot added the bot:chronographer:provided label Dec 20, 2023

arenasys mentioned this pull request Mar 22, 2024

Add simple download progress option #12586

Merged

joeyballentine closed this Mar 27, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add machine readable download progress option #12084

Add machine readable download progress option #12084

joeyballentine commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

RunDevelopment commented Jun 13, 2023

pfmoore commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

RunDevelopment commented Jun 13, 2023

joeyballentine commented Jun 13, 2023 •

edited

Loading

joeyballentine commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

pfmoore commented Jun 13, 2023 •

edited

Loading

joeyballentine commented Jun 13, 2023 •

edited

Loading

RunDevelopment commented Jun 13, 2023

pfmoore left a comment

pfmoore commented Jun 13, 2023

pfmoore commented Jun 14, 2023

joeyballentine commented Jun 15, 2023 •

edited

Loading

joeyballentine commented Jun 16, 2023

pfmoore left a comment

joeyballentine commented Jun 24, 2023

arenasys commented Mar 19, 2024 •

edited

Loading

joeyballentine commented Mar 19, 2024

arenasys commented Mar 19, 2024

joeyballentine commented Mar 19, 2024 •

edited

Loading

joeyballentine commented Mar 27, 2024

Add machine readable download progress option #12084

Add machine readable download progress option #12084

Conversation

joeyballentine commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

RunDevelopment commented Jun 13, 2023

pfmoore commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

RunDevelopment commented Jun 13, 2023

joeyballentine commented Jun 13, 2023 • edited Loading

joeyballentine commented Jun 13, 2023

joeyballentine commented Jun 13, 2023

pfmoore commented Jun 13, 2023 • edited Loading

joeyballentine commented Jun 13, 2023 • edited Loading

RunDevelopment commented Jun 13, 2023

pfmoore left a comment

Choose a reason for hiding this comment

pfmoore commented Jun 13, 2023

pfmoore commented Jun 14, 2023

joeyballentine commented Jun 15, 2023 • edited Loading

joeyballentine commented Jun 16, 2023

pfmoore left a comment

Choose a reason for hiding this comment

joeyballentine commented Jun 24, 2023

arenasys commented Mar 19, 2024 • edited Loading

joeyballentine commented Mar 19, 2024

arenasys commented Mar 19, 2024

joeyballentine commented Mar 19, 2024 • edited Loading

joeyballentine commented Mar 27, 2024

joeyballentine commented Jun 13, 2023 •

edited

Loading

pfmoore commented Jun 13, 2023 •

edited

Loading

joeyballentine commented Jun 13, 2023 •

edited

Loading

joeyballentine commented Jun 15, 2023 •

edited

Loading

arenasys commented Mar 19, 2024 •

edited

Loading

joeyballentine commented Mar 19, 2024 •

edited

Loading