Handle hallucinated tool execution requests #1244

JanHuege · 2025-01-25T16:48:22Z

Not sure why I needed to add the quarkus-junit5 dependency for my test to run.

Added behavior to filter out hallucinated toolExecutionRequests.

Had to add some reflection to write a test with mocked client in OllamaChatLanguageModel. I am happy about suggestions.

Also I am not happy with the assertions in the test "doesNotCrashIfToolNotPresent". But I did not find out how to get the content of the messagebuilder from "chatResponseWithHalucinatedTool" to be present in the resultin ChatResponse. I think it has something todo with missing ToolExecutors but added breakpoints into the ToolExecutors present in the repo did not help since they didn't hit.

Fixes #1232 (hallucinated toolExecutionRequests)

geoand

Thanks a lot!

I'll have a closer look tomorrow, but for now I added a comment about the test

...ama/runtime/src/test/java/io/quarkiverse/langchain4j/ollama/OllamaChatLanguageModelTest.java

model-providers/ollama/runtime/pom.xml

geoand · 2025-01-27T07:54:13Z

As for the tests, I would much rather prefer to have a tool test like the ones in the OpenAI module

JanHuege · 2025-01-27T09:05:27Z

I moved the tests and I've rewritten them to match the wiremock style of the others in ollama but saw your latest remark after I was done.

Are you talking about ToolBoxTest and others in quarkus-langchain4j-openai-deployment ?

I am not sure if I understand the setup of those.

geoand · 2025-01-27T09:26:41Z

Are you talking about ToolBoxTest and others in quarkus-langchain4j-openai-deployment ?

Right. It's similar to the other Wiremock tests and the idea is to be able to track whether a tool has been called or not.

JanHuege · 2025-01-27T10:39:37Z

I have added a scenario test somewhat equal to the ones in the openai package. I had to add the called property as a static one since I was unable to use an AtomicInteger like in ToolsApplicationScopedWithInterceptorTest.java since injecting the Tool creates a second AppScoped Instance making it impossible to check if called is 1. So I helped myself with this somewhat hacky solution. If you got a nicer way of ensuring the call with something like a spy I'd appreciate it.

I kept the other ones since I feel like checking/asserting the filtering of the execution is somewhat neccessary to ensure that calls are filtered out. For me without the other testcases this would cause blindspots.

geoand

Thanks!

JanHuege · 2025-01-27T12:15:16Z

Fixed/ran the formatting

🌘 This workflow status is outdated as a new workflow run has been triggered.

Status for workflow Build (on pull request)

This is the status report for running Build (on pull request) on commit 62bf751.

Failing Jobs

Status Name Step Failures Logs Raw logs
✖ Quick Build Build with Maven Failures Logs Raw logs

Failures

⚙️ Quick Build #
- Failing: model-providers/ollama/deployment 
! Skipped: codestarts codestarts/chatbot codestarts/chatbot/deployment and 65 more
📦 model-providers/ollama/deployment

✖ Failed to execute goal net.revelc.code.formatter:formatter-maven-plugin:2.24.1:validate (default) on project quarkus-langchain4j-ollama-deployment: File '/home/runner/work/quarkus-langchain4j/quarkus-langchain4j/model-providers/ollama/deployment/src/test/java/io/quarkiverse/langchain4j/ollama/deployment/OllamaChatLanguageModelToolTest.java' has not been previously formatted. Please format file (for example by invoking `mvn -f model-providers/ollama/deployment net.revelc.code.formatter:formatter-maven-plugin:2.24.1:format`) and commit before running validation!

JanHuege · 2025-01-27T12:52:50Z

fixed sorting of imports

quarkus-bot · 2025-01-27T13:18:15Z

Status for workflow `Build (on pull request)`

This is the status report for running Build (on pull request) on commit 2cbebc1.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

JanHuege requested a review from a team as a code owner January 25, 2025 16:48

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from f700e5b to 30f2ddb Compare January 25, 2025 16:54

geoand reviewed Jan 26, 2025

View reviewed changes

...ama/runtime/src/test/java/io/quarkiverse/langchain4j/ollama/OllamaChatLanguageModelTest.java Outdated Show resolved Hide resolved

model-providers/ollama/runtime/pom.xml Outdated Show resolved Hide resolved

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 30f2ddb to 7191574 Compare January 27, 2025 09:00

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 7191574 to 16e7724 Compare January 27, 2025 10:34

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 16e7724 to 62bf751 Compare January 27, 2025 10:40

geoand approved these changes Jan 27, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 62bf751 to 6adf4d5 Compare January 27, 2025 12:14

This comment has been minimized.

Sign in to view

add handling for hallucinated toolcall requests

2cbebc1

JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 6adf4d5 to 2cbebc1 Compare January 27, 2025 12:52

geoand merged commit ae79ef3 into quarkiverse:main Jan 27, 2025
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle hallucinated tool execution requests #1244

Handle hallucinated tool execution requests #1244

JanHuege commented Jan 25, 2025 •

edited

Loading

geoand left a comment

geoand commented Jan 27, 2025

JanHuege commented Jan 27, 2025 •

edited

Loading

geoand commented Jan 27, 2025

JanHuege commented Jan 27, 2025

geoand left a comment

This comment has been minimized.

JanHuege commented Jan 27, 2025 •

edited

Loading

Status for workflow `Build (on pull request)`

Failing Jobs

Failures

⚙️ Quick Build #

📦 model-providers/ollama/deployment

This comment has been minimized.

JanHuege commented Jan 27, 2025

quarkus-bot bot commented Jan 27, 2025

Handle hallucinated tool execution requests #1244

Handle hallucinated tool execution requests #1244

Conversation

JanHuege commented Jan 25, 2025 • edited Loading

geoand left a comment

Choose a reason for hiding this comment

geoand commented Jan 27, 2025

JanHuege commented Jan 27, 2025 • edited Loading

geoand commented Jan 27, 2025

JanHuege commented Jan 27, 2025

geoand left a comment

Choose a reason for hiding this comment

This comment has been minimized.

JanHuege commented Jan 27, 2025 • edited Loading

Status for workflow Build (on pull request)

Failing Jobs

Failures

⚙️ Quick Build #

📦 model-providers/ollama/deployment

This comment has been minimized.

JanHuege commented Jan 27, 2025

quarkus-bot bot commented Jan 27, 2025

Status for workflow Build (on pull request)

JanHuege commented Jan 25, 2025 •

edited

Loading

JanHuege commented Jan 27, 2025 •

edited

Loading

JanHuege commented Jan 27, 2025 •

edited

Loading

Status for workflow `Build (on pull request)`

Status for workflow `Build (on pull request)`