Skip to content

v0.3.1.5: Multi-Eval Node

Latest
Compare
Choose a tag to compare
@ianarawjo ianarawjo released this 25 Apr 18:01
· 7 commits to main since this release
6fa3092

This is the first release adding the MultiEval node to ChainForge proper, alongside:

  • improvements to response inspector table view to display multi-criteria scoring in column view
  • table view is now default when multiple evaluators are detected

Voilà:

Screen Shot 2024-03-17 at 12 21 37 AM

As you can see, Multi-Eval allows you to define multiple per-response evaluators inside the same node. You can use this to evaluate responses across multiple criteria. Evaluators can be a mix of code or LLM evaluators, as you see fit, and you can change the LLM scorer model on a per-evaluator basis.

This is a "beta" version of the MultiEval node, for two reasons:

  • The output handle of MultiEval is disabled, since it doesn't yet work with VisNodes to plot data across multiple criteria. That is a separate issue that I didn't want holding up this push. It is coming.
  • There are no genAI features in MultiEval, yet, like there are in Code Evaluator nodes. I want to do this right (beyond EvalGen, which is another matter). The idea is that you can describe the criteria in a prompt and the AI will add an evaluator to the list that it thinks is the best, on a per-criteria basis. For now as a workaround, you can use the genAI feature to generate code inside single Code Evaluators and port that code over.

The EvalGen wizard is also coming, to help users automatically generate evaluation metrics with human supervision. We have a version of this on the multi-eval branch (which due to the TypeScript front-end rewrite, we cannot directly merge into main), but it doesn't integrate Shreya's fixes.