Measuring and benchmarking the safety of the fine-tuned models #5

fcanogab · 2024-06-21T06:01:18Z

There are different frameworks to measure and benchmark against other models the safety/harmfulness of a fine-tuned model. For example, MLCommons defines a framework that can be used for this.

hemajv · 2024-07-09T16:15:39Z

Thanks for bringing this up! I think this is a worthwhile exercise for us to try and evaluate this benchmark. Looks like the benchmark is still in POC, but they have a repo with steps outlined on how to test it out: https://github.com/mlcommons/modelbench

hemajv · 2024-07-09T16:17:55Z

Is this something you might have the bandwidth to try/look into @fcanogab?

erikerlandson · 2024-07-15T17:02:30Z

we might also look at unitxt (an ibm open source project)

Jonathan Bnayahu has added some safety related benchmarks and others, see this search for list:

https://github.com/IBM/unitxt/issues?q=author%3Abnayahu+

fcanogab · 2024-07-22T17:30:26Z

@hemajv, yes, I would like to try to work on this myself.

Thanks for the hint @erikerlandson. I'll take a look at it.

hemajv added this to Data Science WG Jul 12, 2024

hemajv moved this to Ready in Data Science WG Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring and benchmarking the safety of the fine-tuned models #5

Measuring and benchmarking the safety of the fine-tuned models #5

fcanogab commented Jun 21, 2024

hemajv commented Jul 9, 2024 •

edited

Loading

hemajv commented Jul 9, 2024

erikerlandson commented Jul 15, 2024

fcanogab commented Jul 22, 2024 •

edited

Loading

Measuring and benchmarking the safety of the fine-tuned models #5

Measuring and benchmarking the safety of the fine-tuned models #5

Comments

fcanogab commented Jun 21, 2024

hemajv commented Jul 9, 2024 • edited Loading

hemajv commented Jul 9, 2024

erikerlandson commented Jul 15, 2024

fcanogab commented Jul 22, 2024 • edited Loading

hemajv commented Jul 9, 2024 •

edited

Loading

fcanogab commented Jul 22, 2024 •

edited

Loading