Mutate sample #454

granawkins · 2024-01-05T09:04:32Z

This PR refactors and adds to the Sampler utility scripts. Now we have:

validate: check repo/context/diffs each item in your samples dir
evaluate: run mentat on the sample and return the resulting diff
add-context: create n duplicate but with lots of extra (useless) context
remove-context: create n duplicates but missing some context, and with a rejection message (interactive)
finetune: generate finetuning examples from Samples.

Run them with e.g. python scripts/sampler -v, python scripts/sampler --add-context -n 10

jakethekoenig

All 5 operations worked great for me. Some small notes.

jakethekoenig · 2024-01-05T16:28:53Z

scripts/sampler/__main__.py

+                example = await generate_finetune_gpt(sample)
+                example_file = FINETUNE_DIR / f"finetune_{sample.id}.json"
+                with open(example_file, "w") as f:
+                    json.dump(example, f, indent=4)


Suddenly realizing there was no reason for me to have each fine tuning example on it's own line. Something about jsonl makes me think "json per line". Oops.

I've just finally read the actual difference between .json and .jsonl -

I think your interpretation was actually correct for jsonl: each line should be independently json-serializable. I guess theoretically you'd be able to read-in and deserialize one at a time, making it better for extremely large runs where the whole thing is too big to read into memory.

I guess I'm more used to working with .json and don't expect we'll hit that limit soon, right? Any other thoughts?

No other thoughts.

jakethekoenig · 2024-01-05T16:30:10Z

scripts/sampler/__main__.py

+            print(f"Generating fine-tuning example for sample {sample.id[:8]}")
+            try:
+                example = await generate_finetune_gpt(sample)
+                example_file = FINETUNE_DIR / f"finetune_{sample.id}.json"


Can we print this? It was hard to find.

~~I'll print a the FINETUNE_DIR path at the end with the results.~~

Just kidding inline is great.

jakethekoenig · 2024-01-05T16:33:02Z

scripts/sampler/__main__.py

+SAMPLES_DIR = mentat_dir_path / "samples"
+os.makedirs(SAMPLES_DIR, exist_ok=True)
+FINETUNE_DIR = mentat_dir_path / "finetune"
+os.makedirs(FINETUNE_DIR, exist_ok=True)


This is fine for now but I wonder if we should think about the ergonomics of it if we want other people to use it. I'd kind of expect these things to be generated in the user's working directory. I guess if the goal is to collect them from many repos to put together a finetuning/benchmarking data set it makes sense to put them all in one place.

jakethekoenig · 2024-01-05T16:33:56Z

mentat/sampler/README.md

 4. If using a Coding Assistant tool, process the response to apply edits to codebase.
 5. Return the text portion of the conversation and the git diff, corresponding to `message_edit` and `diff_edit`

 We provide two implementations of this:
- Run `scripts/evaluate_samples.py [<id>...]` from the command line, in the mentat repo. Prints to terminal.
+- Run `python scripts/samples [<id>...]` from the command line, in the mentat repo. Prints to terminal.


scripts/sampler not scripts/samples

jakethekoenig · 2024-01-05T16:37:12Z

scripts/sampler/__main__.py

+        "--validate",
+        "-v",
+        action="store_true",
+        help="Validate samples instead of evaluating",


Can we change this help message to "Validate samples conform to spec"

jakethekoenig · 2024-01-05T16:39:12Z

scripts/sampler/__main__.py

+        if not sample_file.exists():
+            warn(f"Sample file {sample_file} does not exist.")
+            continue
+        sample = Sample.load(sample_file)


Because we load samples before validating them certain errors are thrown as exceptions instead of logged e.g. if the sample is not valid json or has extra fields.

jakethekoenig · 2024-01-05T17:03:48Z

scripts/sampler/__main__.py

+                continue
+            try:
+                new_sample = await remove_context(sample)
+                new_sample.save(SAMPLES_DIR / f"sample_{new_sample.id}.json")


Same comment here and above. Can we print the files generated?

jakethekoenig · 2024-01-05T17:09:52Z

scripts/sampler/__main__.py

+
+from mentat.sampler.sample import Sample
+from mentat.utils import mentat_dir_path
+from tests.benchmarks.benchmark_runner import (


I couldn't run this until I added tests to find_packages in setup.py. Btw what do you think about moving the benchmark code out of the tests directory into its own top level directory benchmarking?

Ah makes sense, I'll add that to setup.py.

Ya I like the idea of moving benchmarks to their own directory, mostly because I always forget the --benchmark flag.

jakethekoenig · 2024-01-05T17:14:15Z

scripts/sampler/remove_context.py

+        diff_active=sample.diff_active,
+    )
+    cwd = Path(repo.working_dir)
+    python_client = PythonClient(cwd=Path("."), paths=[])


The python client needs to be shut down.

jakethekoenig · 2024-01-05T18:26:58Z

scripts/sampler/__main__.py

+        else:
+            print(f"Evaluating sample {sample.id[:8]}")
+            print(f"  Prompt: {sample.message_prompt}")
+            diff_eval = await evaluate_sample(sample)


To me evaluate_sample would imply running and grading the sample. Do you think run_sample or execute_sample would be better?

jakethekoenig · 2024-01-05T18:31:11Z

scripts/sampler/evaluate.py

+from mentat.session_context import SESSION_CONTEXT
+
+
+async def evaluate_sample(sample, cwd: Path | str | None = None):


~~What do you think about making this one an instance method of samples? I'd like to use it in the benchmark runner itself and it'd be easy to get there.~~

I changed my mind on this one because it leads to a hard to fix circular import and I wanted to write it slightly differently for the benchmarks here

If you put a sample in the tests/benchmarks/benchmarks directory it will be picked up and evaluated by the benchmark runner. Test with: ``` pytest tests/benchmarks/benchmark_runner.py --benchmark -s --benchmarks \ Dummy clojure ``` Based off PR #454 to make use of setup_repo. There's some duplication of sample evaluation.

granawkins added 6 commits January 5, 2024 10:08

consolidate repo management in sampler.utils

e4d77e1

create setup_repo helper func

cdd6838

add-extra-context and remove-context from samples

b010b10

refactor sampler scripts into directory

8330e3b

improve remove_context workflow

523bd04

remove unused file

41717ae

jakethekoenig approved these changes Jan 5, 2024

View reviewed changes

jakethekoenig reviewed Jan 5, 2024

View reviewed changes

jakethekoenig mentioned this pull request Jan 5, 2024

Samples can be run by the benchmark runner #455

Merged

1 task

Merge remote-tracking branch 'upstream/main' into mutate-sample

414bea0

granawkins mentioned this pull request Jan 5, 2024

Where should we save generated Samples and Finetune examples? #456

Open

fixes from @jakethekoenig feedback

84262c8

granawkins merged commit 9560fa5 into main Jan 5, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutate sample #454

Mutate sample #454

granawkins commented Jan 5, 2024

jakethekoenig left a comment

jakethekoenig Jan 5, 2024

granawkins Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

granawkins Jan 5, 2024 •

edited

Loading

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

granawkins Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024

jakethekoenig Jan 5, 2024 •

edited

Loading

		from mentat.session_context import SESSION_CONTEXT


		async def evaluate_sample(sample, cwd: Path \| str \| None = None):

Mutate sample #454

Mutate sample #454

Conversation

granawkins commented Jan 5, 2024

jakethekoenig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

granawkins Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakethekoenig Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

granawkins Jan 5, 2024 •

edited

Loading

jakethekoenig Jan 5, 2024 •

edited

Loading