[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER #167097

andrew-goldstein · 2023-09-22T23:15:48Z

[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER

This PR integrates LangChain Agents and Tools with the Elastic AI Assistant.

These abstractions enable the LLM to dynamically choose whether or not to query, via ELSER, an ES|QL knowledge base. Context from the knowledge base is used to generate ES|QL queries, or answer questions about ES|QL.

Registration of the tool occurs in x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts:

  const tools: Tool[] = [
    new ChainTool({
      name: 'esql-language-knowledge-base',
      description:
        'Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.',
      chain,
    }),
  ];

The tools array above may be updated in future PRs to include, for example, an ES|QL query validator endpoint.

Details

The callAgentExecutor function in x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts:

Creates a RetrievalQAChain from an ELSER backed ElasticsearchStore, which serves as a knowledge base for ES|QL:

  // ELSER backed ElasticsearchStore for Knowledge Base
  const esStore = new ElasticsearchStore(esClient, KNOWLEDGE_BASE_INDEX_PATTERN, logger);
  const chain = RetrievalQAChain.fromLLM(llm, esStore.asRetriever());

Registers the chain as a tool, which may be invoked by the LLM based on its description:

  const tools: Tool[] = [
    new ChainTool({
      name: 'esql-language-knowledge-base',
      description:
        'Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.',
      chain,
    }),
  ];

Creates an Agent executor that combines the tools above, the ActionsClientLlm (an abstraction that calls actionsClient.execute), and memory of the previous messages in the conversation:

  const executor = await initializeAgentExecutorWithOptions(tools, llm, {
    agentType: 'chat-conversational-react-description',
    memory,
    verbose: false,
  });

Note: Set verbose above to true to for detailed debugging output from LangChain.

Calls the executor, kicking it off with latestMessage:

    await executor.call({ input: latestMessage[0].content });

Changes to `x-pack/packages/kbn-elastic-assistant`

A client side change was required to the assistant, because the response returned from the agent executor is JSON. This response is parsed on the client in x-pack/packages/kbn-elastic-assistant/impl/assistant/api.tsx:

  return assistantLangChain ? getFormattedMessageContent(result) : result;

Client-side parsing of the response only happens when then assistantLangChain feature flag is true.

Desk testing

Set

assistantLangChain={true}

in x-pack/plugins/security_solution/public/assistant/provider.tsx to enable this experimental feature in development environments.

Also (optionally) set verbose to true in the following code in x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts:

  const executor = await initializeAgentExecutorWithOptions(tools, llm, {
    agentType: 'chat-conversational-react-description',
    memory,
    verbose: true,
  });

After setting the feature flag and optionally enabling verbose debugging output, you may ask the assistant to generate an ES|QL query, per the example in the next section.

Example output

When the Elastic AI Assistant is asked:

From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. "September 2019". Only show the query

it replies:

Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:

FROM employees
| KEEP emp_no, hire_date
| EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY")
| SORT hire_date
| LIMIT 5

Per the screenshot below:

The verbose: true output from LangChain logged to the console reveals that the prompt sent to the LLM includes text like the following:

Assistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.

along with instructions for "calling" the tool like a function.

The debugging output also reveals the agent selecting the tool, and returning results from ESLR:

[agent/action] [1:chain:AgentExecutor] Agent selected action: {
  "tool": "esql-language-knowledge-base",
  "toolInput": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.",
  "log": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```"
}
[tool/start] [1:chain:AgentExecutor > 4:tool:ChainTool] Entering Tool run with input: "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: {
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
}
[retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: {
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
}
[retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [115ms] Exiting Retriever run with output: {
  "documents": [
    {
      "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n",

The documents containing ES|QL examples, retrieved from ELSER, are sent back to the LLM to answer the original question, per the abridged output below:

[llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index,

Complete (verbose) LangChain output from the example

The following verbose: true output from LangChain below was produced via the example in the previous section:

[chain/start] [1:chain:AgentExecutor] Entering Chain run with input: {
  "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": []
}
[chain/start] [1:chain:AgentExecutor > 2:chain:LLMChain] Entering Chain run with input: {
  "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": [],
  "agent_scratchpad": [],
  "stop": [
    "Observation:"
  ]
}
[llm/start] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n    \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": \\\"Final Answer\\\",\\n    \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}}]"
  ]
}
[llm/end] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] [3.08s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 2:chain:LLMChain] [3.09s] Exiting Chain run with output: {
  "text": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```"
}
[agent/action] [1:chain:AgentExecutor] Agent selected action: {
  "tool": "esql-language-knowledge-base",
  "toolInput": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.",
  "log": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```"
}
[tool/start] [1:chain:AgentExecutor > 4:tool:ChainTool] Entering Tool run with input: "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: {
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
}
[retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: {
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
}
[retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [115ms] Exiting Retriever run with output: {
  "documents": [
    {
      "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/processing_commands/where.asciidoc"
      }
    }
  ]
}
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.",
  "input_documents": [
    {
      "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/processing_commands/where.asciidoc"
      }
    }
  ],
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'."
}
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] Entering Chain run with input: {
  "question": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.",
  "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.",
  "context": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n"
}
[llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n\n\nQuestion: Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\nHelpful Answer:"
  ]
}
[llm/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] [2.23s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] [2.23s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] [2.23s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] [2.35s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[tool/end] [1:chain:AgentExecutor > 4:tool:ChainTool] [2.35s] Exiting Tool run with output: "FROM employees
| KEEP emp_no, hire_date
| EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY")
| SORT hire_date
| LIMIT 5"
[chain/start] [1:chain:AgentExecutor > 10:chain:LLMChain] Entering Chain run with input: {
  "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": [],
  "agent_scratchpad": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "AIMessage"
      ],
      "kwargs": {
        "content": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```",
        "additional_kwargs": {}
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "TOOL RESPONSE:\n---------------------\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5\n\nUSER'S INPUT\n--------------------\n\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.",
        "additional_kwargs": {}
      }
    }
  ],
  "stop": [
    "Observation:"
  ]
}
[llm/start] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n    \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": \\\"Final Answer\\\",\\n    \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"AIMessage\"],\"kwargs\":{\"content\":\"```json\\n{\\n    \\\"action\\\": \\\"esql-language-knowledge-base\\\",\\n    \\\"action_input\\\": \\\"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\\\"\\n}\\n```\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOL RESPONSE:\\n---------------------\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\\n\\nUSER'S INPUT\\n--------------------\\n\\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.\",\"additional_kwargs\":{}}}]"
  ]
}
[llm/end] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] [6.47s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 10:chain:LLMChain] [6.47s] Exiting Chain run with output: {
  "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```"
}
[chain/end] [1:chain:AgentExecutor] [11.91s] Exiting Chain run with output: {
  "output": "Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\n\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}

…ols integration for ES|QL query generation via ELSER This PR integrates [LangChain](https://www.langchain.com/) [Agents](https://js.langchain.com/docs/modules/agents/) and [Tools](https://js.langchain.com/docs/modules/agents/tools/) with the [Elastic AI Assistant](https://www.elastic.co/blog/introducing-elastic-ai-assistant). These abstractions enable the LLM to dynamically choose whether or not to query, via [ELSER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html), an [ES|QL](https://www.elastic.co/blog/elasticsearch-query-language-esql) knowledge base. Context from the knowledge base is used to generate `ES|QL` queries, or answer questions about `ES|QL`. Registration of the tool occurs in `x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts`: ```typescript const tools: Tool[] = [ new ChainTool({ name: 'esql-language-knowledge-base', description: 'Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.', chain, }), ]; ``` The `tools` array above may be updated in future PRs to include, for example, an `ES|QL` query validator endpoint. ### Details The `callAgentExecutor` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts`: 1. Creates a `RetrievalQAChain` from an `ELSER` backed `ElasticsearchStore`, which serves as a knowledge base for `ES|QL`: ```typescript // ELSER backed ElasticsearchStore for Knowledge Base const esStore = new ElasticsearchStore(esClient, KNOWLEDGE_BASE_INDEX_PATTERN, logger); const chain = RetrievalQAChain.fromLLM(llm, esStore.asRetriever()); ``` 2. Registers the chain as a tool, which may be invoked by the LLM based on its description: ```typescript const tools: Tool[] = [ new ChainTool({ name: 'esql-language-knowledge-base', description: 'Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.', chain, }), ]; ``` 3. Creates an Agent executor that combines the `tools` above, the `ActionsClientLlm` (an abstraction that calls `actionsClient.execute`), and memory of the previous messages in the conversation: ```typescript const executor = await initializeAgentExecutorWithOptions(tools, llm, { agentType: 'chat-conversational-react-description', memory, verbose: false, }); ``` Note: Set `verbose` above to `true` to for detailed debugging output from LangChain. 4. Calls the `executor`, kicking it off with `latestMessage`: ```typescript await executor.call({ input: latestMessage[0].content }); ``` ### Changes to `x-pack/packages/kbn-elastic-assistant` A client side change was required to the assistant, because the response returned from the agent executor is JSON. This response is parsed on the client in `x-pack/packages/kbn-elastic-assistant/impl/assistant/api.tsx`: ```typescript return assistantLangChain ? getFormattedMessageContent(result) : result; ``` Client-side parsing of the response only happens when then `assistantLangChain` feature flag is `true`. ## Desk testing Set ```typescript assistantLangChain={true} ``` in `x-pack/plugins/security_solution/public/assistant/provider.tsx` to enable this experimental feature in development environments. Also (optionally) set `verbose` to `true` in the following code in ``x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts``: ```typescript const executor = await initializeAgentExecutorWithOptions(tools, llm, { agentType: 'chat-conversational-react-description', memory, verbose: true, }); ``` After setting the feature flag and optionally enabling verbose debugging output, you may ask the assistant to generate an `ES|QL` query, per the example in the next section. ### Example output When the Elastic AI Assistant is asked: ``` From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. "September 2019". Only show the query ``` it replies: ``` Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date: FROM employees | KEEP emp_no, hire_date | EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY") | SORT hire_date | LIMIT 5 ``` Per the screenshot below: ![ESQL_query_via_langchain_agents_and_tools](https://github.com/elastic/kibana/assets/4459398/c5cc75da-f7aa-4a12-9078-ed531f3463e7) The `verbose: true` output from LangChain logged to the console reveals that the prompt sent to the LLM includes text like the following: ``` Assistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language. ``` along with instructions for "calling" the tool like a function. The debugging output also reveals the agent selecting the tool, and returning results from ESLR: ``` [agent/action] [1:chain:AgentExecutor] Agent selected action: { "tool": "esql-language-knowledge-base", "toolInput": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.", "log": "```json\n{\n \"action\": \"esql-language-knowledge-base\",\n \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```" } [tool/start] [1:chain:AgentExecutor > 4:tool:ChainTool] Entering Tool run with input: "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." [chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: { "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." } [retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: { "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." } [retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [115ms] Exiting Retriever run with output: { "documents": [ { "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n", ``` The documents containing `ES|QL` examples, retrieved from ELSER, are sent back to the LLM to answer the original question, per the abridged output below: ``` [llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with input: { "prompts": [ "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, ``` ### Complete (verbose) LangChain output from the example The following `verbose: true` output from LangChain below was produced via the example in the previous section: ``` [chain/start] [1:chain:AgentExecutor] Entering Chain run with input: { "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query", "chat_history": [] } [chain/start] [1:chain:AgentExecutor > 2:chain:LLMChain] Entering Chain run with input: { "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query", "chat_history": [], "agent_scratchpad": [], "stop": [ "Observation:" ] } [llm/start] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] Entering LLM run with input: { "prompts": [ "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n \\\"action\\\": \\\"Final Answer\\\",\\n \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}}]" ] } [llm/end] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] [3.08s] Exiting LLM run with output: { "generations": [ [ { "text": "```json\n{\n \"action\": \"esql-language-knowledge-base\",\n \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```" } ] ] } [chain/end] [1:chain:AgentExecutor > 2:chain:LLMChain] [3.09s] Exiting Chain run with output: { "text": "```json\n{\n \"action\": \"esql-language-knowledge-base\",\n \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```" } [agent/action] [1:chain:AgentExecutor] Agent selected action: { "tool": "esql-language-knowledge-base", "toolInput": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.", "log": "```json\n{\n \"action\": \"esql-language-knowledge-base\",\n \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```" } [tool/start] [1:chain:AgentExecutor > 4:tool:ChainTool] Entering Tool run with input: "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." [chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: { "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." } [retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: { "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." } [retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [115ms] Exiting Retriever run with output: { "documents": [ { "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc" } }, { "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc" } }, { "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc" } }, { "pageContent": "[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/processing_commands/where.asciidoc" } } ] } [chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] Entering Chain run with input: { "question": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.", "input_documents": [ { "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc" } }, { "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc" } }, { "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc" } }, { "pageContent": "[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n", "metadata": { "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/processing_commands/where.asciidoc" } } ], "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'." } [chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] Entering Chain run with input: { "question": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.", "query": "Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.", "context": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n" } [llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with input: { "prompts": [ "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-where]]\n=== `WHERE`\n\nUse `WHERE` to produce a table that contains all the rows from the input table\nfor which the provided condition evaluates to `true`:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=where]\n----\n\nWhich, if `still_hired` is a boolean field, can be simplified to:\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereBoolean]\n----\n\n[discrete]\n==== Operators\n\nRefer to <<esql-operators>> for an overview of the supported operators.\n\n[discrete]\n==== Functions\n`WHERE` supports various functions for calculating values. Refer to\n<<esql-functions,Functions>> for more information.\n\n[source,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=whereFunction]\n----\n\n\nQuestion: Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\nHelpful Answer:" ] } [llm/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] [2.23s] Exiting LLM run with output: { "generations": [ [ { "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5" } ] ] } [chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] [2.23s] Exiting Chain run with output: { "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5" } [chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] [2.23s] Exiting Chain run with output: { "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5" } [chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] [2.35s] Exiting Chain run with output: { "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5" } [tool/end] [1:chain:AgentExecutor > 4:tool:ChainTool] [2.35s] Exiting Tool run with output: "FROM employees | KEEP emp_no, hire_date | EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY") | SORT hire_date | LIMIT 5" [chain/start] [1:chain:AgentExecutor > 10:chain:LLMChain] Entering Chain run with input: { "input": "\n\n\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query", "chat_history": [], "agent_scratchpad": [ { "lc": 1, "type": "constructor", "id": [ "langchain", "schema", "AIMessage" ], "kwargs": { "content": "```json\n{\n \"action\": \"esql-language-knowledge-base\",\n \"action_input\": \"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\"\n}\n```", "additional_kwargs": {} } }, { "lc": 1, "type": "constructor", "id": [ "langchain", "schema", "HumanMessage" ], "kwargs": { "content": "TOOL RESPONSE:\n---------------------\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5\n\nUSER'S INPUT\n--------------------\n\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.", "additional_kwargs": {} } } ], "stop": [ "Observation:" ] } [llm/start] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] Entering LLM run with input: { "prompts": [ "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n \\\"action\\\": \\\"Final Answer\\\",\\n \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"AIMessage\"],\"kwargs\":{\"content\":\"```json\\n{\\n \\\"action\\\": \\\"esql-language-knowledge-base\\\",\\n \\\"action_input\\\": \\\"Display the 'emp_no', month and year of the 5 earliest employees by 'hire_date'. Format the date as 'Month Year'.\\\"\\n}\\n```\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOL RESPONSE:\\n---------------------\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\\n\\nUSER'S INPUT\\n--------------------\\n\\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.\",\"additional_kwargs\":{}}}]" ] } [llm/end] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] [6.47s] Exiting LLM run with output: { "generations": [ [ { "text": "```json\n{\n \"action\": \"Final Answer\",\n \"action_input\": \"Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```" } ] ] } [chain/end] [1:chain:AgentExecutor > 10:chain:LLMChain] [6.47s] Exiting Chain run with output: { "text": "```json\n{\n \"action\": \"Final Answer\",\n \"action_input\": \"Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```" } [chain/end] [1:chain:AgentExecutor] [11.91s] Exiting Chain run with output: { "output": "Here is the query to get the employee number and the formatted hire date for the 5 earliest employees by hire_date:\n\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5" } ```

elasticmachine · 2023-09-22T23:15:51Z

Pinging @elastic/security-solution (Team: SecuritySolution)

andrew-goldstein · 2023-09-22T23:16:55Z

x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts

    memory,
-    // See `qaChainOptions` from https://js.langchain.com/docs/modules/chains/popular/chat_vector_db
-    qaChainOptions: { type: 'stuff' },
+    verbose: false,


set this to true for verbose debugging per the PR description

spong · 2023-09-22T23:41:23Z

x-pack/packages/kbn-elastic-assistant/impl/assistant/api.tsx

@@ -78,7 +79,8 @@ export const fetchConnectorExecuteAction = async ({

    if (data.choices && data.choices.length > 0 && data.choices[0].message.content) {
      const result = data.choices[0].message.content.trim();
-      return result;
+
+      return assistantLangChain ? getFormattedMessageContent(result) : result;


A client side change was required to the assistant, because the response returned from the agent executor is JSON.

Thoughts on doing this server side as to not leak the agent executor abstraction to the client? Or was there a specific reason this needed to be pushed to the client?

A client side change was required to the assistant, because the response returned from the agent executor is JSON.

Thoughts on doing this server side as to not leak the agent executor abstraction to the client? Or was there a specific reason this needed to be pushed to the client?

In summary of our offline discussion:

Yes, it's possible to do this server side by introducing an additional JSON.parse of the response from the LLM

In short, the server side code in x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts would look something like the following (psudocode):

await executor.call({ input: latestMessage[0].content }); const rawData = llm.getActionResultData(); // the response from the actions framework if (rawData.choices && rawData.choices.length > 0 && rawData.choices[0].message.content) { const result = rawData.choices[0].message.content.trim(); const formatted = getFormattedMessageContent(result); const data = convertToRecord(formatted); return { connector_id: connectorId, data, status: 'ok', }; } else { throw new Error('Unexpected raw response from the LLM'); }

The psudocode above is very similar to the client side code in x-pack/packages/kbn-elastic-assistant/impl/assistant/api.tsx

If server-side parsing is implemented, the existing client side code should be extracted into a reusable function (exported by the package), for reuse on the server

In summary, it's possible, but given the above adds an additional server side JSON parse of the LLM response and requires some additional client side refactoring, we'll reconsider the above post FF.

Thank you for taking the time to discuss offline and for summarizing here 🙏 , path forward sounds good to me 👍

spong

Checked out, tested locally, code reviewed and LGTM! 👍 Thank you for the very detailed (verbose) description, very helpful! 🙂

Note I: Still seeing some wonkiness w/ immediate memory references falling short as we discussed offline. You can just call me CIDR_MATCH from now on... 😅

Note II: Still seeing a SimilaritySearch being kicked off when prompting nothing about ESQL. I don't recall seeing this when we were testing locally, so just curious if you might know why this would be happening even though we're now registering the ESQL tool?

andrew-goldstein · 2023-09-25T14:17:49Z

@elasticmachine merge upstream

kibana-ci · 2023-09-25T15:33:31Z

💚 Build Succeeded

Buildkite Build
Commit: a76398c

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	12.8MB	12.8MB	+2.9KB

History

💔 Build #161427 failed 484072b

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @andrew-goldstein

andrew-goldstein · 2023-09-25T17:03:45Z

Note II: Still seeing a SimilaritySearch being kicked off when prompting nothing about ESQL. I don't recall seeing this when we were testing locally, so just curious if you might know why this would be happening even though we're now registering the ESQL tool?

Summary

When running locally, I don't observe a SimilaritySearch being kicked off for the same prompt:

Talk to me about the sunshine please

Per the details in the next section (below):

The verbose: true output does NOT appear to contain invocations of the RetrievalQAChain for the prompt above
The instrumented console.log '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! output also does NOT appear for the prompt above

When an ESQL question is added to the conversation, per the screenshot below:

The verbose: true output DOES contain invocations of the RetrievalQAChain, which is required access the ELSER backed ElasticsearchStore for the Knowledge Base
The instrumented console.log '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! output also DOES appear for the ESQL prompt

Details

Talk to me about the sunshine please

Before submitting the prompt above, I made the following changes locally:

set verbose: true in x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts
added the following console.log to the similaritySearch function in x-pack/plugins/elastic_assistant/server/lib/langchain/elasticsearch_store/elasticsearch_store.ts:

    const queryBody: QueryDslQueryContainer = {
      bool: {
        must: [
          {
            text_expansion: {
              'vector.tokens': {
                model_id: this.model,
                model_text: query,
              },
            } as unknown as QueryDslTextExpansionQuery,
          },
        ],
        filter,
      },
    };

    console.log(
      '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! queryBody',
      JSON.stringify(queryBody)
    );

After making the changes above, I submitted the prompt, per the screenshot below:

The verbose: true output from LangChain for the prompt above was:

[chain/start] [1:chain:AgentExecutor] Entering Chain run with input: {
  "input": "\n\n\n\nTalk to me about the sunshine please",
  "chat_history": []
}
[chain/start] [1:chain:AgentExecutor > 2:chain:LLMChain] Entering Chain run with input: {
  "input": "\n\n\n\nTalk to me about the sunshine please",
  "chat_history": [],
  "agent_scratchpad": [],
  "stop": [
    "Observation:"
  ]
}
[llm/start] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n    \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": \\\"Final Answer\\\",\\n    \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\n\\n\\nTalk to me about the sunshine please\",\"additional_kwargs\":{}}}]"
  ]
}
[llm/end] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] [11.84s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.\"\n}\n```"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 2:chain:LLMChain] [11.84s] Exiting Chain run with output: {
  "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.\"\n}\n```"
}
[chain/end] [1:chain:AgentExecutor] [11.84s] Exiting Chain run with output: {
  "output": "The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source."
}

Observations

The verbose: true output does NOT appear to contain invocations of the RetrievalQAChain for the prompt above
The instrumented console.log '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! output also does NOT appear for the prompt above

Next I added the same prompt from the Desk testing section of this PR:

From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. "September 2019". Only show the query

to the conversation, per the screenshot below:

The verbose: true and console.log output from the above is displayed below:

[chain/start] [1:chain:AgentExecutor] Entering Chain run with input: {
  "input": "\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "\n\n\n\nTalk to me about the sunshine please",
        "additional_kwargs": {}
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "AIMessage"
      ],
      "kwargs": {
        "content": "The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.",
        "additional_kwargs": {}
      }
    }
  ]
}
[chain/start] [1:chain:AgentExecutor > 2:chain:LLMChain] Entering Chain run with input: {
  "input": "\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "\n\n\n\nTalk to me about the sunshine please",
        "additional_kwargs": {}
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "AIMessage"
      ],
      "kwargs": {
        "content": "The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.",
        "additional_kwargs": {}
      }
    }
  ],
  "agent_scratchpad": [],
  "stop": [
    "Observation:"
  ]
}
[llm/start] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"\\n\\n\\n\\nTalk to me about the sunshine please\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"AIMessage\"],\"kwargs\":{\"content\":\"The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n    \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": \\\"Final Answer\\\",\\n    \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}}]"
  ]
}
[llm/end] [1:chain:AgentExecutor > 2:chain:LLMChain > 3:llm:ActionsClientLlm] [6.75s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\"\n}\n```"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 2:chain:LLMChain] [6.75s] Exiting Chain run with output: {
  "text": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\"\n}\n```"
}
[agent/action] [1:chain:AgentExecutor] Agent selected action: {
  "tool": "esql-language-knowledge-base",
  "toolInput": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "log": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\"\n}\n```"
}
[tool/start] [1:chain:AgentExecutor > 4:tool:ChainTool] Entering Tool run with input: "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. "September 2019". Only show the query"
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: {
  "query": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"
}
[retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: {
  "query": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"
}
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! queryBody {"bool":{"must":[{"text_expansion":{"vector.tokens":{"model_id":".elser_model_1","model_text":"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"}}}]}}
[2023-09-25T10:26:19.455-06:00][INFO ][plugins.elasticAssistant] Similarity Search Query:
 {"bool":{"must":[{"text_expansion":{"vector.tokens":{"model_id":".elser_model_1","model_text":"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"}}}]}}
[2023-09-25T10:26:19.455-06:00][INFO ][plugins.elasticAssistant] Similarity Search Results:
 [{"pageContent":"[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"}},{"pageContent":"[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"}},{"pageContent":"[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"}},{"pageContent":"[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_extract.asciidoc"}}]
[retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [121ms] Exiting Retriever run with output: {
  "documents": [
    {
      "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_extract.asciidoc"
      }
    }
  ]
}
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] Entering Chain run with input: {
  "question": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "input_documents": [
    {
      "pageContent": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"
      }
    },
    {
      "pageContent": "[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n",
      "metadata": {
        "source": "/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_extract.asciidoc"
      }
    }
  ],
  "query": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"
}
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] Entering Chain run with input: {
  "question": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "query": "From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "context": "[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n"
}
[llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n\n\n[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n\n\n[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n\n\n[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n\n\nQuestion: From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query\nHelpful Answer:"
  ]
}
[llm/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] [3.23s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] [3.24s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] [3.24s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] [3.36s] Exiting Chain run with output: {
  "text": "FROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}
[tool/end] [1:chain:AgentExecutor > 4:tool:ChainTool] [3.36s] Exiting Tool run with output: "FROM employees
| KEEP emp_no, hire_date
| EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY")
| SORT hire_date
| LIMIT 5"
[chain/start] [1:chain:AgentExecutor > 10:chain:LLMChain] Entering Chain run with input: {
  "input": "\n\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query",
  "chat_history": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "\n\n\n\nTalk to me about the sunshine please",
        "additional_kwargs": {}
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "AIMessage"
      ],
      "kwargs": {
        "content": "The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.",
        "additional_kwargs": {}
      }
    }
  ],
  "agent_scratchpad": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "AIMessage"
      ],
      "kwargs": {
        "content": "```json\n{\n    \"action\": \"esql-language-knowledge-base\",\n    \"action_input\": \"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\"\n}\n```",
        "additional_kwargs": {}
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "TOOL RESPONSE:\n---------------------\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5\n\nUSER'S INPUT\n--------------------\n\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.",
        "additional_kwargs": {}
      }
    }
  ],
  "stop": [
    "Observation:"
  ]
}
[llm/start] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] Entering LLM run with input: {
  "prompts": [
    "[{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"SystemMessage\"],\"kwargs\":{\"content\":\"Assistant is a large language model trained by OpenAI.\\n\\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\\n\\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\\n\\nOverall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. However, above all else, all responses must adhere to the format of RESPONSE FORMAT INSTRUCTIONS.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"\\n\\n\\n\\nTalk to me about the sunshine please\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"AIMessage\"],\"kwargs\":{\"content\":\"The sunshine is the light and warmth that we receive from the sun. It is an essential source of energy for life on Earth, supporting photosynthesis in plants, regulating the Earth's climate and weather patterns, and providing countless other benefits. Sunshine can positively affect our mood and overall well-being, as it helps produce vitamin D in our body when our skin is exposed to sunlight. However, it is essential to enjoy sunshine responsibly, using sun protection to prevent skin damage and other harmful effects. Sunshine can also be harnessed for solar power, providing a renewable and sustainable energy source.\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOLS\\n------\\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\\n\\nesql-language-knowledge-base: Call this for knowledge on how to build an ESQL query, or answer questions about the ES|QL query language.\\n\\nRESPONSE FORMAT INSTRUCTIONS\\n----------------------------\\n\\nOutput a JSON markdown code snippet containing a valid JSON object in one of two formats:\\n\\n**Option 1:**\\nUse this if you want the human to use a tool.\\nMarkdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": string, // The action to take. Must be one of [esql-language-knowledge-base]\\n    \\\"action_input\\\": string // The input to the action. May be a stringified object.\\n}\\n```\\n\\n**Option #2:**\\nUse this if you want to respond directly and conversationally to the human. Markdown code snippet formatted in the following schema:\\n\\n```json\\n{\\n    \\\"action\\\": \\\"Final Answer\\\",\\n    \\\"action_input\\\": string // You should put what you want to return to use here and make sure to use valid json newline characters.\\n}\\n```\\n\\nFor both options, remember to always include the surrounding markdown code snippet delimiters (begin with \\\"```json\\\" and end with \\\"```\\\")!\\n\\n\\nUSER'S INPUT\\n--------------------\\nHere is the user's input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\\n\\n\\n\\nFrom employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\"September 2019\\\". Only show the query\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"AIMessage\"],\"kwargs\":{\"content\":\"```json\\n{\\n    \\\"action\\\": \\\"esql-language-knowledge-base\\\",\\n    \\\"action_input\\\": \\\"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \\\\\\\"September 2019\\\\\\\". Only show the query\\\"\\n}\\n```\",\"additional_kwargs\":{}}},{\"lc\":1,\"type\":\"constructor\",\"id\":[\"langchain\",\"schema\",\"HumanMessage\"],\"kwargs\":{\"content\":\"TOOL RESPONSE:\\n---------------------\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\\n\\nUSER'S INPUT\\n--------------------\\n\\nOkay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else.\",\"additional_kwargs\":{}}}]"
  ]
}
[llm/end] [1:chain:AgentExecutor > 10:chain:LLMChain > 11:llm:ActionsClientLlm] [6.29s] Exiting LLM run with output: {
  "generations": [
    [
      {
        "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"Based on your request, here is the ESQL query to display the employee number (emp_no) and the hiring month and year for the 5 earliest employees in the employees table, formatted as \\\"September 2019\\\":\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```"
      }
    ]
  ]
}
[chain/end] [1:chain:AgentExecutor > 10:chain:LLMChain] [6.29s] Exiting Chain run with output: {
  "text": "```json\n{\n    \"action\": \"Final Answer\",\n    \"action_input\": \"Based on your request, here is the ESQL query to display the employee number (emp_no) and the hiring month and year for the 5 earliest employees in the employees table, formatted as \\\"September 2019\\\":\\n\\nFROM employees\\n| KEEP emp_no, hire_date\\n| EVAL month_year = DATE_FORMAT(hire_date, \\\"MMMM YYYY\\\")\\n| SORT hire_date\\n| LIMIT 5\"\n}\n```"
}
[chain/end] [1:chain:AgentExecutor] [16.42s] Exiting Chain run with output: {
  "output": "Based on your request, here is the ESQL query to display the employee number (emp_no) and the hiring month and year for the 5 earliest employees in the employees table, formatted as \"September 2019\":\n\nFROM employees\n| KEEP emp_no, hire_date\n| EVAL month_year = DATE_FORMAT(hire_date, \"MMMM YYYY\")\n| SORT hire_date\n| LIMIT 5"
}

Observations

The verbose: true output DOES contain invocations of the RetrievalQAChain, which is required access the ELSER backed ElasticsearchStore for the Knowledge Base, per the abridged output below:

[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] Entering Chain run with input: {
[retriever/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] Entering Retriever run with input: {
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! queryBody {"bool":{"must":[{"text_expansion":{"vector.tokens":{"model_id":".elser_model_1","model_text":"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"}}}]}}
[2023-09-25T10:26:19.455-06:00][INFO ][plugins.elasticAssistant] Similarity Search Query:
 {"bool":{"must":[{"text_expansion":{"vector.tokens":{"model_id":".elser_model_1","model_text":"From employees, I want to see the 5 earliest employees (hire_date), I want to display only the month and the year that they were hired in and their employee number (emp_no). Format the date as e.g. \"September 2019\". Only show the query"}}}]}}
[2023-09-25T10:26:19.455-06:00][INFO ][plugins.elasticAssistant] Similarity Search Results:
 [{"pageContent":"[[esql-date_format]]\n=== `DATE_FORMAT`\nReturns a string representation of a date in the provided format. If no format\nis specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used.\n\n[source,esql]\n----\nFROM employees\n| KEEP first_name, last_name, hire_date\n| EVAL hired = DATE_FORMAT(hire_date, \"YYYY-MM-dd\")\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_format.asciidoc"}},{"pageContent":"[[esql-from]]\n=== `FROM`\n\nThe `FROM` source command returns a table with up to 10,000 documents from a\ndata stream, index, or alias. Each row in the resulting table represents a\ndocument. Each column corresponds to a field, and can be accessed by the name\nof that field.\n\n[source,esql]\n----\nFROM employees\n----\n\nYou can use <<api-date-math-index-names,date math>> to refer to indices, aliases\nand data streams. This can be useful for time series data, for example to access\ntoday's index:\n\n[source,esql]\n----\nFROM <logs-{now/d}>\n----\n\nUse comma-separated lists or wildcards to query multiple data streams, indices,\nor aliases:\n\n[source,esql]\n----\nFROM employees-00001,employees-*\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/source_commands/from.asciidoc"}},{"pageContent":"[[esql-date_trunc]]\n=== `DATE_TRUNC`\nRounds down a date to the closest interval. Intervals can be expressed using the\n<<esql-timespan-literals,timespan literal syntax>>.\n\n[source,esql]\n----\nFROM employees\n| EVAL year_hired = DATE_TRUNC(1 year, hire_date)\n| STATS count(emp_no) BY year_hired\n| SORT year_hired\n----\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_trunc.asciidoc"}},{"pageContent":"[[esql-date_extract]]\n=== `DATE_EXTRACT`\nExtracts parts of a date, like year, month, day, hour.\nThe supported field types are those provided by https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.html[java.time.temporal.ChronoField]\n\n[source.merge.styled,esql]\n----\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract]\n----\n[%header.monospaced.styled,format=dsv,separator=|]\n|===\ninclude::{esql-specs}/docs.csv-spec[tag=dateExtract-result]\n|===\n\n","metadata":{"source":"/Users/andrew.goldstein/Projects/forks/spong/kibana/x-pack/plugins/elastic_assistant/server/knowledge_base/esql/docs/functions/date_extract.asciidoc"}}]
[retriever/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 6:retriever:VectorStoreRetriever] [121ms] Exiting Retriever run with output: {
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] Entering Chain run with input: {
[chain/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] Entering Chain run with input: {
[llm/start] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] Entering LLM run with
[llm/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain > 9:llm:ActionsClientLlm] [3.23s] Exiting LLM run
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain > 8:chain:LLMChain] [3.24s] Exiting Chain run with output: {
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain > 7:chain:StuffDocumentsChain] [3.24s] Exiting Chain run with output: {
[chain/end] [1:chain:AgentExecutor > 4:tool:ChainTool > 5:chain:RetrievalQAChain] [3.36s] Exiting Chain run with output: {
[tool/end] [1:chain:AgentExecutor > 4:tool:ChainTool] [3.36s] Exiting Tool run with output: "FROM employees
| KEEP emp_no, hire_date
| EVAL month_year = DATE_FORMAT(hire_date, "MMMM YYYY")
| SORT hire_date
| LIMIT 5"

The instrumented console.log '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! output also DOES appear for the ESQL prompt

@timestamp

…eration (RAG) for Alerts This PR implements _Retrieval Augmented Generation_ (RAG) for Alerts in the Security Solution. This feature enables users to ask the assistant questions about the latest and riskiest open alerts in their environment using natural language, for example: - _How many alerts are currently open?_ - _Which alerts should I look at first?_ - _Did we have any alerts with suspicious activity on Windows machines?_ ### More context Previously, the assistant relied solely on the knowledge of the configured LLM and _singular_ alerts or events passed _by the client_ to the LLM as prompt context. This new feature: - Enables _multiple_ alerts to be passed by the _server_ as context to the LLM, via [LangChain tools](elastic#167097) - Applies the user's [anonymization](elastic#159857) settings to those alerts - Only fields allowed by the user will be sent as context to the LLM - Users may enable or disable anonymization for specific fields (via settings) - Click the conversation's `Show anonymized` toggle to see the anonymized values sent to / received from the LLM: ![show_anonymized](https://github.com/elastic/kibana/assets/4459398/7db85f69-9352-4422-adbf-c97248ccb3dd) ### Settings This feature is enabled and configured via the `Knowledge Base` > `Alerts` settings in the screenshot below: ![rag_on_alerts_setting](https://github.com/elastic/kibana/assets/4459398/9161b6d4-b7c3-4f37-bcde-f032f5a02966) - The `Alerts` toggle enables or disables the feature - The slider has a range of `10` - `100` alerts (default: `20`) When the setting above is enabled, up to `n` alerts (as determined by the slider) that meet the following criteria will be returned: - the `kibana.alert.workflow_status` must be `open` - the alert must have been generated in the last `24 hours` - the alert must NOT be a `kibana.alert.building_block_type` alert - the `n` alerts are ordered by `kibana.alert.risk_score`, to prioritize the riskiest alerts ### Feature flag To use this feature: 1) Add the `assistantRagOnAlerts` feature flag to the `xpack.securitySolution.enableExperimental` setting in `config/kibana.yml` (or `config/kibana.dev.yml` in local development environments), per the example below: ``` xpack.securitySolution.enableExperimental: ['assistantRagOnAlerts'] ``` 2) Enable the `Alerts` toggle in the Assistant's `Knowledge Base` settings, per the screenshot below: ![alerts_toggle](https://github.com/elastic/kibana/assets/4459398/07f241ea-af4a-43a4-bd19-0dc6337db167) ## How it works - When the `Alerts` settings toggle is enabled, http `POST` requests to the `/internal/elastic_assistant/actions/connector/{id}/_execute` route include the following new (optional) parameters: - `alertsIndexPattern`, the alerts index for the current Kibana Space, e.g. `.alerts-security.alerts-default` - `allow`, the user's `Allowed` fields in the `Anonymization` settings, e.g. `["@timestamp", "cloud.availability_zone", "file.name", "user.name", ...]` - `allowReplacement`, the user's `Anonymized` fields in the `Anonymization` settings, e.g. `["cloud.availability_zone", "host.name", "user.name", ...]` - `replacements`, a `Record<string, string>` of replacements (generated on the server) that starts empty for a new conversation, and accumulates anonymized values until the conversation is cleared, e.g. ```json "replacements": { "e4f935c0-5a80-47b2-ac7f-816610790364": "Host-itk8qh4tjm", "cf61f946-d643-4b15-899f-6ffe3fd36097": "rpwmjvuuia", "7f80b092-fb1a-48a2-a634-3abc61b32157": "6astve9g6s", "f979c0d5-db1b-4506-b425-500821d00813": "Host-odqbow6tmc", // ... }, ``` - `size`, the numeric value set by the slider in the user's `Knowledge Base > Alerts` setting, e.g. `20` - The `postActionsConnectorExecuteRoute` function in `x-pack/plugins/elastic_assistant/server/routes/post_actions_connector_execute.ts` was updated to accept the new optional parameters, and to return an updated `replacements` with every response. (Every new request that is processed on the server may add additional anonymized values to the `replacements` returned in the response.) - The `callAgentExecutor` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts` previously used a hard-coded array of LangChain tools that had just one entry, for the `ESQLKnowledgeBaseTool` tool. That hard-coded array was replaced in this PR with a call to the (new) `getApplicableTools` function: ```typescript const tools: Tool[] = getApplicableTools({ allow, allowReplacement, alertsIndexPattern, assistantLangChain, chain, esClient, modelExists, onNewReplacements, replacements, request, size, }); ``` - The `getApplicableTools` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/index.ts` examines the parameters in the `KibanaRequest` and only returns a filtered set of LangChain tools. If the request doesn't contain all the parameters required by a tool, it will NOT be returned by `getApplicableTools`. For example, if the required anonymization parameters are not included in the request, the `open-alerts` tool will not be returned. - The new `alert-counts` LangChain tool returned by the `getAlertCountsTool` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/alert_counts/get_alert_counts_tool.ts` provides the LLM the results of an aggregation on the last `24` hours of alerts (in the current Kibana Space), grouped by `kibana.alert.severity`. See the `getAlertsCountQuery` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/alert_counts/get_alert_counts_query.ts` for details - The new `open-alerts` LangChain tool returned by the `getOpenAlertsTool` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/open_alerts/get_open_alerts_tool.ts` provides the LLM up to `size` non-building-block alerts generated in the last `24` hours (in the current Kibana Space) with an `open` workflow status, ordered by `kibana.alert.risk_score` to prioritize the riskiest alerts. See the `getOpenAlertsQuery` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/open_alerts/get_open_alerts_query.ts` for details. - On the client, a conversation continues to accumulate additional `replacements` (and send them in subsequent requests) until the conversation is cleared - Anonymization functions that were only invoked by the browser were moved from the (browser) `kbn-elastic-assistant` package in `x-pack/packages/kbn-elastic-assistant/` to a new common package: `x-pack/packages/kbn-elastic-assistant-common` - The new `kbn-elastic-assistant-common` package is also consumed by the `elastic_assistant` (server) plugin: `x-pack/plugins/elastic_assistant`

@timestamp

…tion (RAG) for Alerts (#172542) ## [Security Solution] [Elastic AI Assistant] Retrieval Augmented Generation (RAG) for Alerts This PR implements _Retrieval Augmented Generation_ (RAG) for Alerts in the Security Solution. This feature enables users to ask the assistant questions about the latest and riskiest open alerts in their environment using natural language, for example: - _How many alerts are currently open?_ - _Which alerts should I look at first?_ - _Did we have any alerts with suspicious activity on Windows machines?_ ### More context Previously, the assistant relied solely on the knowledge of the configured LLM and _singular_ alerts or events passed _by the client_ to the LLM as prompt context. This new feature: - Enables _multiple_ alerts to be passed by the _server_ as context to the LLM, via [LangChain tools](#167097) - Applies the user's [anonymization](#159857) settings to those alerts - Only fields allowed by the user will be sent as context to the LLM - Users may enable or disable anonymization for specific fields (via settings) - Click the conversation's `Show anonymized` toggle to see the anonymized values sent to / received from the LLM: ![show_anonymized](https://github.com/elastic/kibana/assets/4459398/7db85f69-9352-4422-adbf-c97248ccb3dd) ### Settings This feature is enabled and configured via the `Knowledge Base` > `Alerts` settings in the screenshot below: ![rag_on_alerts_setting](https://github.com/elastic/kibana/assets/4459398/9161b6d4-b7c3-4f37-bcde-f032f5a02966) - The `Alerts` toggle enables or disables the feature - The slider has a range of `10` - `100` alerts (default: `20`) When the setting above is enabled, up to `n` alerts (as determined by the slider) that meet the following criteria will be returned: - the `kibana.alert.workflow_status` must be `open` - the alert must have been generated in the last `24 hours` - the alert must NOT be a `kibana.alert.building_block_type` alert - the `n` alerts are ordered by `kibana.alert.risk_score`, to prioritize the riskiest alerts ### Feature flag To use this feature: 1) Add the `assistantRagOnAlerts` feature flag to the `xpack.securitySolution.enableExperimental` setting in `config/kibana.yml` (or `config/kibana.dev.yml` in local development environments), per the example below: ``` xpack.securitySolution.enableExperimental: ['assistantRagOnAlerts'] ``` 2) Enable the `Alerts` toggle in the Assistant's `Knowledge Base` settings, per the screenshot below: ![alerts_toggle](https://github.com/elastic/kibana/assets/4459398/07f241ea-af4a-43a4-bd19-0dc6337db167) ## How it works - When the `Alerts` settings toggle is enabled, http `POST` requests to the `/internal/elastic_assistant/actions/connector/{id}/_execute` route include the following new (optional) parameters: - `alertsIndexPattern`, the alerts index for the current Kibana Space, e.g. `.alerts-security.alerts-default` - `allow`, the user's `Allowed` fields in the `Anonymization` settings, e.g. `["@timestamp", "cloud.availability_zone", "file.name", "user.name", ...]` - `allowReplacement`, the user's `Anonymized` fields in the `Anonymization` settings, e.g. `["cloud.availability_zone", "host.name", "user.name", ...]` - `replacements`, a `Record<string, string>` of replacements (generated on the server) that starts empty for a new conversation, and accumulates anonymized values until the conversation is cleared, e.g. ```json "replacements": { "e4f935c0-5a80-47b2-ac7f-816610790364": "Host-itk8qh4tjm", "cf61f946-d643-4b15-899f-6ffe3fd36097": "rpwmjvuuia", "7f80b092-fb1a-48a2-a634-3abc61b32157": "6astve9g6s", "f979c0d5-db1b-4506-b425-500821d00813": "Host-odqbow6tmc", // ... }, ``` - `size`, the numeric value set by the slider in the user's `Knowledge Base > Alerts` setting, e.g. `20` - The `postActionsConnectorExecuteRoute` function in `x-pack/plugins/elastic_assistant/server/routes/post_actions_connector_execute.ts` was updated to accept the new optional parameters, and to return an updated `replacements` with every response. (Every new request that is processed on the server may add additional anonymized values to the `replacements` returned in the response.) - The `callAgentExecutor` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/execute_custom_llm_chain/index.ts` previously used a hard-coded array of LangChain tools that had just one entry, for the `ESQLKnowledgeBaseTool` tool. That hard-coded array was replaced in this PR with a call to the (new) `getApplicableTools` function: ```typescript const tools: Tool[] = getApplicableTools({ allow, allowReplacement, alertsIndexPattern, assistantLangChain, chain, esClient, modelExists, onNewReplacements, replacements, request, size, }); ``` - The `getApplicableTools` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/index.ts` examines the parameters in the `KibanaRequest` and only returns a filtered set of LangChain tools. If the request doesn't contain all the parameters required by a tool, it will NOT be returned by `getApplicableTools`. For example, if the required anonymization parameters are not included in the request, the `open-alerts` tool will not be returned. - The new `alert-counts` LangChain tool returned by the `getAlertCountsTool` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/alert_counts/get_alert_counts_tool.ts` provides the LLM the results of an aggregation on the last `24` hours of alerts (in the current Kibana Space), grouped by `kibana.alert.severity`. See the `getAlertsCountQuery` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/alert_counts/get_alert_counts_query.ts` for details - The new `open-alerts` LangChain tool returned by the `getOpenAlertsTool` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/open_alerts/get_open_alerts_tool.ts` provides the LLM up to `size` non-building-block alerts generated in the last `24` hours (in the current Kibana Space) with an `open` workflow status, ordered by `kibana.alert.risk_score` to prioritize the riskiest alerts. See the `getOpenAlertsQuery` function in `x-pack/plugins/elastic_assistant/server/lib/langchain/tools/open_alerts/get_open_alerts_query.ts` for details. - On the client, a conversation continues to accumulate additional `replacements` (and send them in subsequent requests) until the conversation is cleared - Anonymization functions that were only invoked by the browser were moved from the (browser) `kbn-elastic-assistant` package in `x-pack/packages/kbn-elastic-assistant/` to a new common package: `x-pack/packages/kbn-elastic-assistant-common` - The new `kbn-elastic-assistant-common` package is also consumed by the `elastic_assistant` (server) plugin: `x-pack/plugins/elastic_assistant`

andrew-goldstein requested a review from a team as a code owner September 22, 2023 23:15

andrew-goldstein self-assigned this Sep 22, 2023

andrew-goldstein commented Sep 22, 2023

View reviewed changes

spong reviewed Sep 22, 2023

View reviewed changes

spong approved these changes Sep 22, 2023

View reviewed changes

Merge branch 'main' into assistant-langchain-agents-tools

a76398c

andrew-goldstein merged commit 077be69 into elastic:main Sep 25, 2023

andrew-goldstein deleted the assistant-langchain-agents-tools branch September 25, 2023 17:55

andrew-goldstein mentioned this pull request Dec 5, 2023

[Security Solution] [Elastic AI Assistant] Retrieval Augmented Generation (RAG) for Alerts #172542

Merged

dhru42 mentioned this pull request Dec 11, 2023

[Request] Retrieval Augemented Generation (RAG) for Alerts elastic/security-docs#4456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER #167097

[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER #167097

andrew-goldstein commented Sep 22, 2023

elasticmachine commented Sep 22, 2023

andrew-goldstein Sep 22, 2023

spong Sep 22, 2023

andrew-goldstein Sep 25, 2023

spong Sep 25, 2023

spong left a comment

andrew-goldstein commented Sep 25, 2023

kibana-ci commented Sep 25, 2023

andrew-goldstein commented Sep 25, 2023

[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER #167097

[Security Solution] [Elastic AI Assistant] LangChain Agents and Tools integration for ES|QL query generation via ELSER #167097

Conversation

andrew-goldstein commented Sep 22, 2023