Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] [Security GenAI][BUG] KB index entry created via pdf upload does not give the right response (#198020) #198076

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

kibanamachine
Copy link
Contributor

Backport

This will backport the following commits from main to 8.x:

Questions ?

Please refer to the Backport tool documentation

…give the right response (elastic#198020)

## Summary

These changes fix the issue with the wrong response of the AI Assistant
using knowledge base tool and index entry generated from a PDF file.

The issue happens because we are using the first chunk of uploaded PDF
document as a context that we pass to LLM instead of using inner hits
chunks which are actual parts of the document relevant to the questions.

Here is [the blog
post](https://www.elastic.co/search-labs/blog/semantic-text-with-amazon-bedrock)
that talks about the strategy of using inner hits to get the most
relevant documents. (see `Strategy 1: API Calls` section)

### Upload + index PDF

1. Navigate to Integrations page
2. Select "Upload a file"
3. Select and upload a PDF file
4. Press Import button
5. Switch to Advanced tab
6. Fill in "Index name"
7. Add additional field > Add semantic text field > Fill in form
  * Field: `attachment.content`
  * Copy to field: `content`
  * Inference endpoint: `elser_model_2`
8. Press Add button
9. Press Import button

### Add KB index entry (with uploaded PDF data)

1. Navigate to AI Assistant's Knowledge Base page
2. New > Index
3. Fill in "New index entry" form (below are main fields)
  * Name: `[add entry name]`
  * Index: `[select index name created during uploading a PDF file]`
  * Field: `content`
4. Press Save button

### Testing notes

Enable knowledge base feature via

```
xpack.securitySolution.enableExperimental:
  - 'assistantKnowledgeBaseByDefault'
```

### Example PDF for testing

**PDF document**:
[Elastic Global Threat Report
2024](https://github.com/user-attachments/files/17544720/elastic-global-threat-report-2024.pdf)

**KB Index entry**:
Data Description: "Use this tool to answer questions about the Elastic
Global Threat Report (GTR) 2024"
Query Instruction: "Key terms to return data relevant to the Elastic
Global Threat Report (GTR) 2024"

**Questions**:
1. Who are the authors of the GTR 2024?
2. What is the forecast for the coming year in GTR 2024?
3. What are top 10 Process Injection by rules in Windows endpoints in
GTR 2024?
4. What is the most widely adopted cloud service provider this year
according to GTR 2024?
6. Give a brief conclusion of the GTR 2024

**Current behaviour**:

<img width="656" alt="Screenshot 2024-10-28 at 16 43 48"
src="https://github.com/user-attachments/assets/90615356-8807-4786-b58d-ca28c83aaec9">

**Fixed behaviour**:

<img width="655" alt="Screenshot 2024-10-28 at 16 44 47"
src="https://github.com/user-attachments/assets/9ebefbcc-20c2-4c79-98f3-11fa6acf3da6">

(cherry picked from commit af2bff4)
@kibanamachine kibanamachine merged commit c113396 into elastic:8.x Oct 28, 2024
32 checks passed
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

cc @e40pud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants