Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

Where should we save generated Samples and Finetune examples? #456

Closed
granawkins opened this issue Jan 5, 2024 · 0 comments
Closed

Where should we save generated Samples and Finetune examples? #456

granawkins opened this issue Jan 5, 2024 · 0 comments

Comments

@granawkins
Copy link
Member

In #454 these are saved to the mentat dir i.e. ~/Home/.mentat/samples. Conversation from that PR:

@jakethekoenig:

This is fine for now but I wonder if we should think about the ergonomics of it if we want other people to use it. I'd kind of expect these things to be generated in the user's working directory. I guess if the goal is to collect them from many repos to put together a finetuning/benchmarking data set it makes sense to put them all in one place.

@granawkins:

My main motivation was that users should be able to use the sampler just on a pip install, without cloning all of mentat locally.
I do like the idea of these being in the target directory, i.e. ~/Users/latent-dictionary/.mentat/samples/*. Allowing both (a .mentat dir in your home dir, as well as different repos) follows the pattern of .mentat_config.json too, and may be useful in the future for code summaries, whatever.
Another issue I run into is, when you run any of the scripts, it takes over your repo and resets everything, so in most cases it's preferable to do those operations on a copy of your repo, not the main one. e.g. I'm in the middle of a big commit, run mentat and save a sample, and I want to quickly validate the sample. If I try to run validate with my active workspace, all my active changes will disappear.
So IF we allow those operations on a pip install, we either warn them, or we create a clone of the target repo in ~/Users/.mentat to do the operations on. And in that case, having the samples there (not in the target repo) might make more sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants