Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ridiculous benchmark #15

Merged
merged 5 commits into from
Dec 7, 2023
Merged

Ridiculous benchmark #15

merged 5 commits into from
Dec 7, 2023

Conversation

wpietri
Copy link
Contributor

@wpietri wpietri commented Dec 7, 2023

Just enough to make a number come out for one SUT and one Test, and to sketch an architectural direction. Some comments on open concerns in the code.

In theory, you should be able to do poetry install to set up the environment and PYTHONPATH=src pytest to run the tests. If that works, you can run the benchmark (including a full HELM run) for GPT2 and BBQ like this:

$ python src/run.py 
GPT2 scored 1.5 stars

…e out for one sut, and to sketch an architectural direction.
@wpietri wpietri requested a review from a team as a code owner December 7, 2023 00:55
Copy link

github-actions bot commented Dec 7, 2023

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@wpietri wpietri force-pushed the ridiculous-benchmark branch from 1885696 to 8325b73 Compare December 7, 2023 01:05
@wpietri
Copy link
Contributor Author

wpietri commented Dec 7, 2023

recheck

2 similar comments
@wpietri
Copy link
Contributor Author

wpietri commented Dec 7, 2023

recheck

@wpietri
Copy link
Contributor Author

wpietri commented Dec 7, 2023

recheck

Copy link
Contributor

@brianwgoldman brianwgoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poetry should be able to make it so you don't have to edit PYTHONPATH. It might just be that you have to have a directory named coffee that includes a __init__.py? It also may require prefixing modules with coffee. when importing them.

Another symptom of something not set up right is when I run poetry install I get The current project could not be installed: No file/folder found for package coffee.

src/run.py Show resolved Hide resolved
src/run.py Show resolved Hide resolved
src/run.py Show resolved Hide resolved
src/run.py Outdated
self.prefix = prefix

def runspecs(self) -> List[str]:
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply there should never be an actual HelmTest object, only derived classes of it. In that situation I recommend Python's Abstract Base Class (ABC) library. That way you can mark this method as @abstractmethod and the interpreter will enforce all derived classes override it.

Not something you have to fix here, but a pattern I'd like to use in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, happy to do.


def _make_output_dir(self):
o = pathlib.Path.cwd()
if o.name in ['src', 'test']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very magic and potentially error prone. Can we pass output_dir into run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic needs to happen somewhere, as some ways of invoking this (as from my IDE) are inclined to run it in the src or test directories. Which then puts a bunch of output where it shouldn't be.

We could move the magic out and pass the directory in, but the theory here is that the CliHelmRunner is just going to do some stuff and give you a run result from which you can get scores, so requiring a working directory be passed in violates encapsulation and increases burden on the object's user, something we should generally seek to reduce.

Is there some plausible near-term use case you have in mind where this would cause a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An advantage to making it passed in is any pytests we write can use a temp directory to ensure multiple runs don't conflict and data generated during the test gets cleaned up.

Is there some plausible near-term use case you have in mind where this would cause a problem?

If there is a subdirectory of src and you run this from there, you are back to getting output in unexpected places. If we move src to be in a coffee directory this logic also breaks.

requiring a working directory be passed in violates encapsulation

HelmResult has to know about output_directory, so IMO it isn't encapsulated. I think there is also a reason for a user to want to be able to find all these results after-the-fact, so asking them where to put it feels reasonable.

src/run.py Outdated
return command


class Benchmark:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd been thinking about a Benchmark as doing two things:

  1. Define the list of Tests it needs to run
  2. Define how to aggregate the Results from those Tests to produce a single Score.

In that context, its strange to me to have a Benchmark specify a SUT. None of its functionality seems to use the SUT. Is this instead something like a BenchmarkScorer, responsible for doing the aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could split this into intent and result objects. The result object definitely needs a SUT, because it has scores on the SUT. You're correct that the intent doesn't. Eventually we'll probably have quite a rich result object, as well as a complicated intent. But for now they were simple enough that the code seemed fine with them together.

tests/test_helm_runner.py Show resolved Hide resolved
tests/test_helm_runner.py Show resolved Hide resolved
src/run.py Show resolved Hide resolved
@wpietri
Copy link
Contributor Author

wpietri commented Dec 7, 2023

Great points, next is setting up CI so that we can make sure it works for everybody, not just on one person's machine.

@dhosterman
Copy link
Collaborator

Good here. This is very preliminary work that we'll be ironing out as we learn more.

@dhosterman dhosterman merged commit c4da412 into main Dec 7, 2023
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 7, 2023
@wpietri wpietri deleted the ridiculous-benchmark branch December 7, 2023 19:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants