Skip to content

Latest commit

 

History

History
101 lines (66 loc) · 7.43 KB

File metadata and controls

101 lines (66 loc) · 7.43 KB

Testing and Validation


Assessment Measures and Evaluation Technique

The following testing procedure aims to verify that the Agent correctly identifies and understands user intents for accessing customer data (e.g., account information), fulfilling business workflows through pre-defined intents (e.g., completing a loan application), and answering general queries (see Sample Prompts under README). Response accuracy is determined by evaluating the relevancy, coherency, and human-like nature of the answers generated by the Bedrock-hosted Anthropic Claude LLM. The source links provided with each response, whether from Kendra data sources (e.g., Web Crawler configured for 'octankfinancial.com') or the Bedrock LLM's training dataset, should also be confirmed as credible.

Username: Demo User
PIN: 1234

  • Provide Personalized Responses: Verify the Agent successfully accesses and utilizes relevant customer information In DynamoDB to tailor user-specific responses.

❗ The use of PIN authentication within the Agent is for demonstration purposes only and should not be used in any production implementation.

  • Curate Opinionated Answers: Validate that opinionated questions are met with opinioned answers by the Agent correctly sourcing replies based on authoritative customer documents and webpages indexed by Kendra.

  • Deliver Contextual Generation: Determine the Agent's ability to provide contextually relevant responses based on previous prompt history.

  • Access General Knowledge: Confirm the Agent's access to general knowledge information for non-customer-specific, non-opinionated queries that require accurate and coherent retorts based on Bedrock LLM training data.

  • Execute Pre-Defined Intents: Ensure the agent correctly interprets and conversationally fulfills user prompts that are intended to be routed to pre-defined intents, such as completing a loan application as part of a business workflow.

Resultant Loan Application Document completed through conversational flow:

Multi-channel support functionality can be tested in conjunction with the above assessment measures across Web, SMS, and Voice channels.

Conclusion

While the demonstrated solution showcases the capabilities of a generative AI Financial Services agent powered by Amazon Bedrock, it is essential to recognize that this solution is not Production-ready. Rather, it serves as an illustrative example for developers aiming to create their personalized conversational agents for diverse applications like virtual workers and customer support systems. A developer’s path to Production would iterate on this sample solution with the following considerations:

Security and Privacy

Ensure data security and user privacy throughout the implementation process. Implement appropriate access controls and encryption mechanisms to protect sensitive information. Solutions like the GenerativeAI Financial Services Agent will benefit from data which is not yet available to the underlying LLM, which often means you will want to use your own private data for the biggest jump in capability.

  • Keep it secret. Keep it safe. You will want this data to stay completely protected, secure, and private during the generative process, and want control over how this data is shared and used.
  • Set some rules of the road. Understand how data is used by a service before making it available to your teams. Create and distribute the rules for what data can be used with what service. Make these clear to your teams so they can move quickly and prototype safely.
  • Involve Legal, sooner rather than later. Have your Legal teams review the T&Cs and service cards of the services you plan to use before you start running any sensitive data through them. Your Legal partners have never been more important than they are today.
  • As an example of how we are thinking about this at AWS with Amazon Bedrock: All data is encrypted and does not leave your VPC, and Bedrock makes a separate copy of the base Foundational Model that is accessible only to the customer, and fine-tunes or trains this private copy of the model.

User Acceptance Testing (UAT)

Conduct UAT with real users to evaluate the performance, usability, and satisfaction of the GenerativeAI Financial Services Agent. Gather feedback and make necessary improvements based on user input.

Deployment and Monitoring

Deploy the fully-tested Agent on AWS, and implement monitoring and logging to track its performance, identify issues, and optimize the system as needed. AWS Lambda monitoring and troubleshooting features are enabled by default for the Agent's Lambda handler.

Maintenance and Updates

Regularly update the Agent with the latest LLM versions and data to enhance its accuracy and effectiveness. Monitor customer-specific data in DynamoDB and synchronize Kendra's data source indexing as needed.

By following this guide, you can successfully implement, test, and validate a reliable GenerativeAI Financial Services Agent, providing users with accurate and personalized financial assistance through natural language conversations.

Resources

Please note: Sample code, software libraries, command line tools, proofs of concept, templates, or other related technology are provided as AWS Content or Third-Party Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content or Third-Party Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content or Third-Party Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content or Third-Party Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.


Clean Up

see Clean Up


Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0