Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memaryParse #44

Open
kingjulio8238 opened this issue Jun 8, 2024 · 1 comment
Open

memaryParse #44

kingjulio8238 opened this issue Jun 8, 2024 · 1 comment

Comments

@kingjulio8238
Copy link
Owner

memary currently parses the agents' responses, which are stored in a .txt file, before inserting them into our knowledge graphs.

As we look to support agentic systems running real-world tasks, our memory unit needs to allow the system's maintainer to pre-process the knowledge graph with relevant data. For example, an e-commerce company wants to upload their users' information so that the agent can initially respond with context.

Companies may present this data in various file formats, such as .csv, .pdf, .txt, .pptx, or others. That is why memary must support many configurable parsers under a parent parser - memaryParse. For example, a company running an agent with data in .csv and .docx files can configure a parent retriever that supports both formats to pre-process the data into the knowledge graph before running their agents using memary.

We expect memaryParse to expand over time. Initially, we hope to support the following formats:

  • .txt (already configured)
  • table extraction
  • JSON
  • Images (.jpg, .jpeg, .png, .gif)
  • Document and presentations (.pdf, .doc / .docx, .rtf, .pages, .pptx, .xml, .key)
  • Web (htm, html)
  • Spreadsheets (.xlsx, .xls, .csv, .numbers)

memaryParse should also support the following result types: TXT, MD, and JSON (we will look to add others in the future).

Resource for inspiration: https://github.com/run-llama/llama_parse/blob/main/llama_parse/utils.py

@rawwerks
Copy link

rawwerks commented Jun 16, 2024

for PDFs, please just let me put my llamaindex API key to use llamaparse. they have worked so hard on this, i would strongly advise not to re-invent it.

for other datatypes (and for getting the text from llamaparse into a KG), here are some resources. neo4j has done a lot of work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants