Skip to content

birdperson1970/pind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Interaction Dump (PIND) Project

Problem Statement

Python's Dynamic Typing makes it easy to write flexible code but it make it hard for developers to understand exactly what is going on. One of the approaches take to determine the actual arguments is run the code in the PDB, (Python Debugger). With the introduction of LLM assitants, understanding Python is difficult because there is no easy way for them to understand the context of a method.

Solution

Pind goal is to give Users (LLMs / Developers) context by capturing each debug Step/Frame in a Python Application and then presenting it back in a clear and understandable format. Pind is broken up into two parts:

  • Stream Capture: Pind attaches to PDB and steps though each Step/Frame.
  • Stream Report: A reporting module ingests the Stream and output the relivant information based on the User's needs.

With this detailed context LLMS will be able to generate:

  • Detailed Event Trace diagrams of the entire system
  • Generate Python unit tests with Mocks
  • Improve code being generated by LLMs

Features

Stream Capture

  • Capture all of the events for a PDB run from the commandline
python pind/pind.py tests/nested.py 
Trace output saved to .trace_dump/nested_20231128_095452_trace.json
  • Trigger pind from a PDB session. By calling
from pind import pind
pind.Pind("stream.json").run_till_break()

Stream Reporting

Pind Context Cache

This will ingest the entire stream and build an in memory. You can then call methods to gain contextual information about project

  • find_method(str method_name)-> List(method_ids): This will match any text to a list of all methods
  • get_method_inbound_calls(method_id, unique=False): This will return a method descriptor showing who called it, what arguements and what it returned :
{
    "name" : "func_a"
    "inbound" : [
        {
            "id" : "2.43.23.4"
            "from_id" : "321#func_c"
            "local_vars": {
                "x": "5"
             },
            "return" : { "12" }
        },
    ]
}
  • get_method_outbound_calls(method_id, unique=False): This will return all of the outbound calls made by this method :
{
    "name" : "func_a"
    "outbound" : [
        {
            "id" : "2.43.23.4"
            "to_id" : "543#func_c"
            "local_vars": {
                "x": "5"
             },
            "return" : { "12" }
        },
    ]
}

Pind Event Trace

This will generate a complete Plant UML Event Trace diagram showing each step.

A long term goal would to feed this information into AutoComplete.

Setup

To use PIND, you'll need to setup a Python virtual environment and install the required packages. Follow the steps below:

Usage

python pind/pind.py tests/nested.py 
Trace output saved to .trace_dump/nested_20231128_095452_trace.json
python pind/pind_normalise.py  .trace_dump/nested_20231128_095452_trace.json .trace_dump/normalised.json

Nested trace output saved to .trace_dump/normalised.json

How it works

Stream Capture

The following is an example of the event stream:

     {
        "event": "line",
        "filename": "nested.py",
        "lineno": 6,
        "function": "func_A",
        "code_context": "b = func_B(x)",
        "local_vars": null,
        "return_value": null
    },
    {
        "event": "call",
        "filename": "nested.py",
        "lineno": 10,
        "function": "func_B",
        "code_context": "def func_B(x):",
        "local_vars": {
            "x": "5"
        },
        "return_value": null
    },

Frame Numbering

To uniquely identify each frame in each run each it is unique idnetifier:

[run_id].[step_count].[step_count].....

  • run_id: this is the last 4 char in the encoded start time
  • step_count: the number of steps taken at that stack depth

This will allow the system to quickly traverse up and down the stack. It is also incremental so you can easily tell for any two frames which came first which will be essential in the object history

This will allow the system to quickly travers up and down the stack

main          [gB2z.21]
│
└── func_a(4) [gB2z.22]
     │
     ├─── func_a(x=4)   [gB2z.22.1]
     │    │
     │    ├── x+=3      [gB2z.22.2]
     │    │
     │    └── func_b(x) [gB2z.22.3]
     │         │
     │         └─── func_b(y=7)  [gB2z.22.3.1]

Object Versioning

Objects can be large and complex. It try and optimize their storage only changes will be recorded Objects will only be checked for changes when then are passed into local methods. Though this may still prove to be expensive and an explicit skip list may be generated to optimized this in the future.

"local_vars": {
    "ai_tool": "<__main__.OpenAITools object at 0x108296a40>"
},

Object stream storage:

[
{
    "id":"OpenAITools.23"
    "versions":[
        {
            "frame_id" : "gB2z.22.3.1",
            "fields": [
                {
                    "name": "f1"
                    "type": "int"
                    "value": "4"
                },
                {
                    "name": "my_obj"
                    "type": "Object"
                    "value": "OpenAIApi.42"
                }
            ]
        }
    ]    
},

About

Python Interaction Dump

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages