Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore stack graphs / scope graphs #11

Open
0xdevalias opened this issue Apr 4, 2024 · 2 comments
Open

Explore stack graphs / scope graphs #11

0xdevalias opened this issue Apr 4, 2024 · 2 comments

Comments

@0xdevalias
Copy link
Owner

0xdevalias commented Apr 4, 2024

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

Potentially Related

  • https://en.wikipedia.org/wiki/Code_property_graph
    • A code property graph of a program is a graph representation of the program obtained by merging its abstract syntax trees (AST), control-flow graphs (CFG) and program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of graph databases such as Neo4j, JanusGraph and OrientDB where data is stored in the nodes and edges as key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.

    • Joern CPG. The original code property graph was implemented for C/C++ in 2013 at University of Göttingen as part of the open-source code analysis tool Joern. This original version has been discontinued and superseded by the open-source Joern Project, which provides a formal code property graph specification applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the Ghidra disassembler).

      • https://github.com/joernio/joern
        • Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs.

        • Joern is a platform for analyzing source code, bytecode, and binary executables. It generates code property graphs (CPGs), a graph representation of code for cross-language code analysis. Code property graphs are stored in a custom graph database. This allows code to be mined using search queries formulated in a Scala-based domain-specific query language. Joern is developed with the goal of providing a useful tool for vulnerability discovery and research in static program analysis.

        • https://joern.io/
        • https://cpg.joern.io/
          • Code Property Graph Specification 1.1

          • This is the specification of the Code Property Graph, a language-agnostic intermediate graph representation of code designed for code querying.

            The code property graph is a directed, edge-labeled, attributed multigraph. This specification provides the graph schema, that is, the types of nodes and edges and their properties, as well as constraints that specify which source and destination nodes are permitted for each edge type.

            The graph schema is structured into multiple layers, each of which provide node, property, and edge type definitions. A layer may depend on multiple other layers and make use of the types it provides.

  • https://docs.openrewrite.org/concepts-explanations/lossless-semantic-trees
    • A Lossless Semantic Tree (LST) is a tree representation of code. Unlike the traditional Abstract Syntax Tree (AST), OpenRewrite's LST offers a unique set of characteristics that make it possible to perform accurate transformations and searches across a repository

See Also

@0xdevalias
Copy link
Owner Author

0xdevalias commented May 30, 2024

This seems to be aligned to how some other agents have chosen to go, eg.

Where they saw it as an improvement on their older method:

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

Further to this, aider just set a new SOTA and topped the SWE-bench lite leaderboard with 26.3%. While all of that performance gain can't be attributed to just their smart code search/repo map'; I would happily bet that it helped it achieve it:

It will be interesting to see if they end up exploring stack graphs directly, and if that improves their performance further again:

Originally posted by @0xdevalias in SWE-agent/SWE-agent#38 (comment)

@0xdevalias
Copy link
Owner Author

0xdevalias commented Oct 25, 2024

I've looked into this before, it's a super promising project that's not quite ready for primetime. The issues boil down to:

I believe that unless there is a team that's dedicated to active development of stack-graphs, it would be very hard to get value from integrating

Originally posted by @maks-ivanov in Aider-AI/aider#534 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant