Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to improve upload performance #3501

Merged

Conversation

webbnh
Copy link
Member

@webbnh webbnh commented Jul 24, 2023

The recent "Ops Review" session revealed that upload requests time out if the file approaches 7Gb in size. Profiling the execution indicates that the bulk of the time is spent extracting the metadata.log file from the results tarball. This is due to a "feature" of tar archives: the same file may appear more than once, and so an extraction has to search the entire archive to ensure that the member being extracted is the most up-to-date (i.e., last) one, and this is punishing as the archive size grows. (Also, the Python Standard library tarfile package is not as performant as the standalone tar(1) utility, and this costs additional time.)

This PR changes the Server to spawn a subprocess to run the tar(1) utility, which allows us to specify the --occurrence option which allows us to abort the search after the first instance of the member is found. Since Pbench owns the construction of the archive, we know that each member occurs only once, so there is no need to search for other instances of it. Also, since the metadata.log file occurs near the beginning of the archive, by aborting the search when we find the first instance of the member, we are able to reduce the overall search time from multiple minutes to a fraction of a second.

This PR consists of a series of commits. Two of them are the changes to the cache manager module to add the subprocess execution and the changes to the corresponding unit tests. Two of them are passes over the cache manager module and the corresponding tests to "pick lint". The others are iteration on the respective changes to the code and the tests.

@webbnh webbnh added the Server label Jul 24, 2023
@webbnh webbnh added this to the v0.73 milestone Jul 24, 2023
@webbnh webbnh self-assigned this Jul 24, 2023
@webbnh webbnh force-pushed the extract_first_metadata branch from 09b0581 to f733bdd Compare July 24, 2023 23:24
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Show resolved Hide resolved
@webbnh webbnh force-pushed the extract_first_metadata branch from f733bdd to de1e124 Compare July 26, 2023 18:56
@webbnh webbnh force-pushed the extract_first_metadata branch 2 times, most recently from 3c9b2fb to 806af82 Compare July 26, 2023 21:16
dbutenhof
dbutenhof previously approved these changes Jul 26, 2023
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Show resolved Hide resolved
@webbnh webbnh force-pushed the extract_first_metadata branch 3 times, most recently from 85502f5 to 828d815 Compare July 29, 2023 01:12
@webbnh webbnh force-pushed the extract_first_metadata branch 9 times, most recently from 940f84c to 9ba86ed Compare July 31, 2023 18:52
@webbnh webbnh force-pushed the extract_first_metadata branch from 9ba86ed to b22cee5 Compare July 31, 2023 19:13
@webbnh webbnh marked this pull request as ready for review July 31, 2023 20:13
@webbnh webbnh requested review from dbutenhof and ndokos July 31, 2023 20:13
@webbnh
Copy link
Member Author

webbnh commented Jul 31, 2023

This is now ready for review.

Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly great, but unless I'm missing something, a few tests seem a bit weak?

lib/pbench/server/cache_manager.py Show resolved Hide resolved
lib/pbench/server/cache_manager.py Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
@webbnh webbnh force-pushed the extract_first_metadata branch from a8d06fc to 6507be1 Compare July 31, 2023 22:17
Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test-cache-manager.py needs to be blackened

dbutenhof
dbutenhof previously approved these changes Jul 31, 2023
Copy link
Member

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_cache_manager.py needs to be blackened... but I'll grant a symbolic victory lap ...

Copy link
Member

@ndokos ndokos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at the test code, but the rest LGTM.

@@ -1042,6 +1040,7 @@ def mock_shutil_which(
with monkeypatch.context() as m:
m.setattr(shutil, "which", mock_shutil_which)
m.setattr(subprocess, "Popen", MockPopen)
m.setattr(Inventory, "close", MockBufferedReader.close)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's fine that you're mocking Inventory.close to a different close you also don't expect to be called, and you cover all of the Inventory class below, but it looks odd and the self-documentation here (plus the assert if it is called) could be better. Ah, well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine to mock Inventory.close() with a different close(), given that they are all supposed to be providing the same io.IOBase.close() interface! 😆 (And, yeah, since I expect none of them to be called, using separate implementations seems silly and redundant. 😉)

Nevertheless, I don't disagree that, on the surface, it looks funny. 🙂

As for the documentation...well, Nick is still looking at the test code, so there is still a chance to fix that. 🙃 What in particular would you like to see?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the clearest documentation would be a mocked Inventory.close that doesn't claim to be a stream close. I recognize that's "wasted" code, but it's not much. Second best would be a comment that we're just being lazy here because it doesn't really matter as we don't expect it to be called. (And if you're at all like me, once you start to write that and think about how it'll look, you'll probably just write the "proper" mock instead.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you start to write that and think about how it'll look, you'll probably just write the "proper" mock instead.

One of the problems that I'm trying to address is to minimize the amount of code in the unit tests which is not executed per the coverage report. So, adding an additional mock which causes the test to fail if it is called actually moves away from that goal.

the clearest documentation would be a mocked Inventory.close that doesn't claim to be a stream close.

I find this to be curious, because, AIUI, we want Inventory to act like a stream, right? (I mean, it provides all the same methods with basically the same semantics.) So, it makes sense to me to use a stream-close mock for Inventory.close() (in this weird instance, anyway)....

Second best would be a comment that we're just being lazy here because it doesn't really matter as we don't expect it to be called.

I'll do that, if you think it's worth it (although, I don't think I'll term it as "lazy", because that's not really what it's accomplishing for me... 🙂).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not wait for me to check the test code before you merge. I don't know how long it will take me 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to be curious, because, AIUI, we want Inventory to act like a stream, right? (I mean, it provides all the same methods with basically the same semantics.) So, it makes sense to me to use a stream-close mock for Inventory.close() (in this weird instance, anyway)....

Inventory composes a stream and a "stream provider" so they can be managed together. That composition is critical, and eliding that to mock it as if it were a stream just seems really obscure and awkward to me. But what you've got will work, even if it might confuse a future reader who even notices it. I approved, so I'll leave the rest to you.

@webbnh webbnh merged commit ca36407 into distributed-system-analysis:main Aug 2, 2023
@webbnh webbnh deleted the extract_first_metadata branch August 2, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants