audrey.ai

AI | Frontend | Backend

audrey.ai

✨ TEAM ✨

🚀 Sangmin Lee
🐋 Sungbeom Choi
🦄 ChanHyuk Park
🌟 Minsu Park

We are developing an AI-powered online research tool aimed at streamlining repetitive and time-consuming tasks in online data research. Our goal is to enable individuals to focus on more important tasks by harnessing the capabilities of generative AI technology.

We plan to start by collecting data from trusted sources such as 통계청 and 정책 브리핑, and then establish partnerships with other data-rich websites to expand our vector database. This will allow us to provide valuable and reliable information for research purposes.

Our tool will be versatile, capable of handling a wide range of data formats, including web pages, PDF documents, YouTube videos, and even audio content. This flexibility ensures that users can extract information from diverse sources efficiently.

To make our tool even more user-friendly and productive, we will implement an autonomous agent that can understand and execute user commands effectively. This agent will serve as a valuable assistant, helping users navigate and extract information from the vast pool of data available online.

In summary, our AI-powered online research tool aims to enhance the productivity and efficiency of data research by automating repetitive tasks, providing access to reliable data sources, and incorporating an autonomous agent to assist users in their research endeavors.

Agent

To make GPT more useful, we introduce Agent. it has been enhanced to understand human language, think autonomously, and make judgments to use appropriate tools.

It can now retrieve necessary information from databases or the web, and, based on the found data or numerical information, it is equipped to draw graphs or charts as required.

Chunking Strategy

We conducted extensive preprocessing to ensure that GPT could better understand the data. This involved removing noise information (such as Ads, navigation bars, etc) and incorporating visual data to comprehend context and structural details.

We obtained official authorization and fine-tuned Faster RCNN on a dataset comprising 200 images from the 통계청 (Korean Statistical Office) and 정부 브리핑 (Government Briefings). Through this process, we are able to divide each chunk into the following categories by utilizing the visual information

Category	Description
Topic	Identifying the central subject or theme of the content.
Title	Recognizing and understanding the document or presentation's title.
Contents	Grasping the textual information within the document or presentation.
Figure	Identifying visual elements such as images or illustrations.
Graph	Recognizing and interpreting graphical representations of data.
Table	Understanding tabular data structures.
Table Caption	Recognizing and comprehending captions associated with tables.
Comment	Identifying and understanding comments or annotations within the content.

This comprehensive preprocessing and fine-tuning approach enhances GPT's ability to process and categorize information effectively, making it more proficient in analyzing textual and visual data in a structured manner.

Embedding

When you do research, you'll be collecting a variety of data types. We've made it possible to receive multiple types of data, not just text.

Pipeline

Design our pipeline to work with you through the entire process

Backend Structure

Result

Semantic Search

통계청 only returns results when keywords are matched, which makes searching difficult. We provide valuable material that is semantically similar.

Table Understanding

Tables contain a lot of useful information that has a structural counterpart. However, if you scrape them directly into text, GPT can't understand them very well and can't utilize them. In order to understand tables well, it is important to preprocess the structure of the table into a form that GPT can understand, rather than just scraping it. When scraping the table as it was, GPT didn't utilize the table information. But with our chunking method, the table was better understood and utilized.

Graph Tool (Agent)

GPT outputs text by default, so it can't generate visuals like graphs. We give our agents tools, so they can write Python code to plot graphs based on the information they receive as text.

Report

This is the first draft of our model. Based on the data we put in, model write a good draft with useful tables and thumbnail images (from DALL-E).

Citations

@misc{embedchain,
  author = {Taranjeet Singh},
  title = {Embedchain: Framework to easily create LLM powered bots over any dataset},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/embedchain/embedchain}},
}

@article{shen2021layoutparser,
  title={LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis},
  author={Shen, Zejiang and Zhang, Ruochen and Dell, Melissa and Lee, Benjamin Charles Germain and Carlson, Jacob and Li, Weining},
  journal={arXiv preprint arXiv:2103.15348},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
drafts		drafts
main		main
users		users
writer @ 8348c2a		writer @ 8348c2a
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
embedchain_test.py		embedchain_test.py
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI | Frontend | Backend

audrey.ai

✨ TEAM ✨

Agent

Chunking Strategy

Embedding

Pipeline

Backend Structure

Result

Semantic Search

Table Understanding

Graph Tool (Agent)

Report

Citations

About

Releases

Packages

Languages

devch1013/OpenAI_SKT_BE

Folders and files

Latest commit

History

Repository files navigation

AI | Frontend | Backend

audrey.ai

✨ TEAM ✨

Agent

Chunking Strategy

Embedding

Pipeline

Backend Structure

Result

Semantic Search

Table Understanding

Graph Tool (Agent)

Report

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages