fix(moderation prompt): lower false positives in toxic category #102

mantagen · 2024-09-09T18:20:22Z

Description

I tried a few variants locally (prompt/model combinations), and this was the best I found. (note: gpt-4o-mini is no good for this!).

updates open-ai from 4.52.0 to 4.58.1 (allowing for json_schema mode)
moderation agent now uses "json_schema" mode
adds a 'note' to the Toxic moderation category description
amends 'instruction' setting of the prompt
makes justification non-optional
filter categories by parent category score

Later will explore some more in-depth changes, but that would come with more risk associated, so want to do that when there's a dataset (and more time) to give the confidence required.

Issue(s)

https://www.notion.so/oaknationalacademy/Fix-moderation-bug-where-all-categories-get-flagged-3c93a1e819bb438f971187cb8be653bc

How to test

Moderation should generally work (e.g. test some sensitive and toxic material)

vercel · 2024-09-09T18:20:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
oak-ai-lesson-assistant	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 10, 2024 9:58am

github-actions · 2024-09-09T18:26:59Z

Playwright e2e tests

Job summary

Download report

To view traces locally, unzip the report and run:

npx playwright show-report ~/Downloads/playwright-report

sonarqubecloud · 2024-09-10T09:55:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codeincontext

Makes sense to me 👍

oak-machine-user · 2024-09-16T13:57:38Z

🎉 This PR is included in version 1.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

mantagen added 2 commits September 9, 2024 18:52

fix(moderation prompt): lower false positives in toxic category

4b71e92

add parent threshold check

23e111f

vercel bot deployed to Preview September 9, 2024 18:24 View deployment

mantagen marked this pull request as ready for review September 10, 2024 08:04

mantagen added 2 commits September 10, 2024 09:35

Merge branch 'main' into fix/moderation-prompt

336175f

use model that supports json_schema

3b20f29

vercel bot deployed to Preview September 10, 2024 09:25 View deployment

remove 'strict' from chat.completion call

7553340

vercel bot deployed to Preview September 10, 2024 09:58 View deployment

mantagen requested a review from a team September 10, 2024 10:01

codeincontext approved these changes Sep 10, 2024

View reviewed changes

mantagen merged commit 76e3cd5 into main Sep 10, 2024
13 checks passed

mantagen deleted the fix/moderation-prompt branch September 10, 2024 10:08

mikeritson-oak mentioned this pull request Sep 11, 2024

build: release candidate #117

Closed

codeincontext mentioned this pull request Sep 12, 2024

build: release candidate 2024-09-12 #128

Merged

simonrose121 mentioned this pull request Sep 16, 2024

build: release candidate 2024-09-12 (attempt 2) #138

Merged

oak-machine-user added the released label Sep 16, 2024

mantagen added a commit that referenced this pull request Oct 3, 2024

fix(moderation prompt): lower false positives in toxic category (#102)

507eedc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(moderation prompt): lower false positives in toxic category #102

fix(moderation prompt): lower false positives in toxic category #102

mantagen commented Sep 9, 2024 •

edited

Loading

vercel bot commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 9, 2024 •

edited

Loading

sonarqubecloud bot commented Sep 10, 2024

codeincontext left a comment

oak-machine-user commented Sep 16, 2024

fix(moderation prompt): lower false positives in toxic category #102

fix(moderation prompt): lower false positives in toxic category #102

Conversation

mantagen commented Sep 9, 2024 • edited Loading

Description

Issue(s)

How to test

vercel bot commented Sep 9, 2024 • edited Loading

github-actions bot commented Sep 9, 2024 • edited Loading

Playwright e2e tests

sonarqubecloud bot commented Sep 10, 2024

Quality Gate passed

codeincontext left a comment

Choose a reason for hiding this comment

oak-machine-user commented Sep 16, 2024

mantagen commented Sep 9, 2024 •

edited

Loading

vercel bot commented Sep 9, 2024 •

edited

Loading

github-actions bot commented Sep 9, 2024 •

edited

Loading