Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph DB? #45

Open
FellowTraveler opened this issue Jul 5, 2024 · 1 comment
Open

Graph DB? #45

FellowTraveler opened this issue Jul 5, 2024 · 1 comment

Comments

@FellowTraveler
Copy link

FellowTraveler commented Jul 5, 2024

Please give details of where/how I can help integrate Graph DB in this project. Pointers, bullets, etc.
I am interested in C++ ingestion, meaning the directory, all the files. Chunk them up with each function in its own chunk. Make summaries of all the chunks that contain exact scope-resolved names and then make embeddings of the summaries. Extract all the entities/relationships into a graph.
I have made some progress on this on my own by parsing doxygen output XML, since doxygen has already done the hard work of extracting all the entities/relationships...Would rather contribute here since Ailice IMO is doing it right and I don't want to re-invent the wheel.
EDIT: I also want this to be a general solution that works equally for Rust, Python, etc.
Basically some kind of structure where it is forced to write a test function first, then write the implementation of the actual function, then make sure it runs, and make sure the test passes -- systematically, piece by piece until it's all done. I don't want it to step all over all the previous work it did, nor do I want it to continue coding the next feature until after it has first ensured that the most recent change passes the tests. Only then should it continue on to the next piece. And even if it's debugging, it shouldn't lose all of the work it has previously done. Because it will blow a lot of money while debugging something, only to later just delete the work it did.
What does it do now, and where does it need to go, and is there any specific area I can pitch in? Details please to speed me up.

@stevenlu137
Copy link
Collaborator

This feature is suitable to be developed into a module providing code editing functions, essentially a code editor with advanced features. In this module, we need to expose a series of interface functions to AIlice, including functions like OPEN, SAVE, EDIT-FUNCTION, EDIT-CLASS, etc. Upon opening a project, it should return the structure and overview of the project and be able to find information related to specified entities or code snippets, such as locating the function definition from a function call or finding all occurrences of a function name, and so on. This way, AIlice can work on the higher structure of a complex software project.

For the design of the module, you can refer to modules/AArxiv.py, modules/ABrowser.py, and modules/ATextBrowser.py. The first file contains detailed comments and serves as an example code for AIlice's self-expansion functionality, so it provides comprehensive explanations of the module interface constraints.

This is a very interesting idea, and we should be able to achieve considerable success in complex software engineering capabilities. Its potential drawback might be the relatively complex prompts. To address this, you need to make the prompt as dynamic as possible, meaning you should only put the entry functions (like OPEN) in prompt_coderproxy.txt, and return other relevant function prompts as needed. You can refer to the design of web or text browsers for this.

Finally, an equally important task for AIlice might be to use a Graph DB to build the long-term memory mechanism. AIlice's long-term memory module (currently modules/AStorageVecDB.py) has a simple interface (STORE/QUERY) and its function is to provide each agent with semantic associations to recall past information. Currently, our long-term memory is based solely on semantic similarity retrieval, but in the future, we need to consider more semantic associations. Graph DB may play a crucial role here.

These two points, complex software engineering and long-term memory, are currently the most important issues for AIlice. I am excited to see your further progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants