Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle hallucinated tool execution requests #1244

Merged

Conversation

JanHuege
Copy link
Contributor

@JanHuege JanHuege commented Jan 25, 2025

Not sure why I needed to add the quarkus-junit5 dependency for my test to run.

Added behavior to filter out hallucinated toolExecutionRequests.

Had to add some reflection to write a test with mocked client in OllamaChatLanguageModel. I am happy about suggestions.

Also I am not happy with the assertions in the test "doesNotCrashIfToolNotPresent". But I did not find out how to get the content of the messagebuilder from "chatResponseWithHalucinatedTool" to be present in the resultin ChatResponse. I think it has something todo with missing ToolExecutors but added breakpoints into the ToolExecutors present in the repo did not help since they didn't hit.

Fixes #1232 (hallucinated toolExecutionRequests)

@JanHuege JanHuege requested a review from a team as a code owner January 25, 2025 16:48
@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from f700e5b to 30f2ddb Compare January 25, 2025 16:54
Copy link
Collaborator

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

I'll have a closer look tomorrow, but for now I added a comment about the test

@geoand
Copy link
Collaborator

geoand commented Jan 27, 2025

As for the tests, I would much rather prefer to have a tool test like the ones in the OpenAI module

@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 30f2ddb to 7191574 Compare January 27, 2025 09:00
@JanHuege
Copy link
Contributor Author

JanHuege commented Jan 27, 2025

I moved the tests and I've rewritten them to match the wiremock style of the others in ollama but saw your latest remark after I was done.

Are you talking about ToolBoxTest and others in quarkus-langchain4j-openai-deployment ?

I am not sure if I understand the setup of those.

@geoand
Copy link
Collaborator

geoand commented Jan 27, 2025

Are you talking about ToolBoxTest and others in quarkus-langchain4j-openai-deployment ?

Right. It's similar to the other Wiremock tests and the idea is to be able to track whether a tool has been called or not.

@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 7191574 to 16e7724 Compare January 27, 2025 10:34
@JanHuege
Copy link
Contributor Author

I have added a scenario test somewhat equal to the ones in the openai package. I had to add the called property as a static one since I was unable to use an AtomicInteger like in ToolsApplicationScopedWithInterceptorTest.java since injecting the Tool creates a second AppScoped Instance making it impossible to check if called is 1. So I helped myself with this somewhat hacky solution. If you got a nicer way of ensuring the call with something like a spy I'd appreciate it.

I kept the other ones since I feel like checking/asserting the filtering of the execution is somewhat neccessary to ensure that calls are filtered out. For me without the other testcases this would cause blindspots.

@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 16e7724 to 62bf751 Compare January 27, 2025 10:40
Copy link
Collaborator

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This comment has been minimized.

@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 62bf751 to 6adf4d5 Compare January 27, 2025 12:14
@JanHuege
Copy link
Contributor Author

JanHuege commented Jan 27, 2025

Fixed/ran the formatting

🌘 This workflow status is outdated as a new workflow run has been triggered.

Status for workflow Build (on pull request)

This is the status report for running Build (on pull request) on commit 62bf751.

Failing Jobs

Status Name Step Failures Logs Raw logs
✖ Quick Build Build with Maven Failures Logs Raw logs

Failures

⚙️ Quick Build #

- Failing: model-providers/ollama/deployment 
! Skipped: codestarts codestarts/chatbot codestarts/chatbot/deployment and 65 more

📦 model-providers/ollama/deployment

Failed to execute goal net.revelc.code.formatter:formatter-maven-plugin:2.24.1:validate (default) on project quarkus-langchain4j-ollama-deployment: File '/home/runner/work/quarkus-langchain4j/quarkus-langchain4j/model-providers/ollama/deployment/src/test/java/io/quarkiverse/langchain4j/ollama/deployment/OllamaChatLanguageModelToolTest.java' has not been previously formatted. Please format file (for example by invoking `mvn -f model-providers/ollama/deployment net.revelc.code.formatter:formatter-maven-plugin:2.24.1:format`) and commit before running validation!

This comment has been minimized.

@JanHuege JanHuege force-pushed the handleHallucinatedToolExecutionRequests branch from 6adf4d5 to 2cbebc1 Compare January 27, 2025 12:52
@JanHuege
Copy link
Contributor Author

fixed sorting of imports

Copy link

quarkus-bot bot commented Jan 27, 2025

Status for workflow Build (on pull request)

This is the status report for running Build (on pull request) on commit 2cbebc1.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

@geoand geoand merged commit ae79ef3 into quarkiverse:main Jan 27, 2025
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question] Memory deleted at the end of the request
2 participants