-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial design for this plugin #1
Comments
It would be interesting if this mechanism could handle human-powered enrichments too - after all, saying "run OCR against everything in this column and write the discovered text back to this other column" isn't really any different from saying "ask a human being to type in the text from this image". They can work from the same APIs! |
The main things that need to be designed then are:
|
I'm inclined to say that enrichments that want to work in parallel should implement that themselves - so a job can only be worked on by a single worker, but that worker is welcome to grab a batch of 100 items at once and execute a massively parallel architecture of some sort to crunch through that batch as fast as possible. Or grab 10x100 batches and process 1000 in parallel. That at I can outsource managing that parallelism and keep the core mechanism in Datasette as simple as possible. |
The prototype now successfully handles an embedding run against OpenAI! It needs a bunch of tidying up but it's looking very promising. Here's the table after the demo run completed: Persisting the OpenAI API key like that is clearly not good. I'm also not convinced I got the cost calculation right - I think rounding is throwing away too much information. |
Thoughts on the API token problem:
|
This plugin will work by providing its own plugin hook that can be used to register "enrichments" - classes that can enrich data in some way, for example:
Each of these enrichments will itself be a plugin. The
datasette-enrichments
plugin will be responsible for tracking which enrichments are to run against which columns and tracking progress along the way.Crucially, many enrichment implementations will be expected to run as separate processes - so this plugin will offer an API that external enrichment processes can use to ask "what do I need to do?" and to then record their results back to the Datasette instance.
The text was updated successfully, but these errors were encountered: