Python Interaction Dump (PIND) Project

Problem Statement

Python's Dynamic Typing makes it easy to write flexible code but it make it hard for developers to understand exactly what is going on. One of the approaches take to determine the actual arguments is run the code in the PDB, (Python Debugger). With the introduction of LLM assitants, understanding Python is difficult because there is no easy way for them to understand the context of a method.

Solution

Pind goal is to give Users (LLMs / Developers) context by capturing each debug Step/Frame in a Python Application and then presenting it back in a clear and understandable format. Pind is broken up into two parts:

Stream Capture: Pind attaches to PDB and steps though each Step/Frame.
Stream Report: A reporting module ingests the Stream and output the relivant information based on the User's needs.

With this detailed context LLMS will be able to generate:

Detailed Event Trace diagrams of the entire system
Generate Python unit tests with Mocks
Improve code being generated by LLMs

Features

Stream Capture

Capture all of the events for a PDB run from the commandline

python pind/pind.py tests/nested.py 
Trace output saved to .trace_dump/nested_20231128_095452_trace.json

Trigger pind from a PDB session. By calling

from pind import pind
pind.Pind("stream.json").run_till_break()

Stream Reporting

Pind Context Cache

This will ingest the entire stream and build an in memory. You can then call methods to gain contextual information about project

find_method(str method_name)-> List(method_ids): This will match any text to a list of all methods
get_method_inbound_calls(method_id, unique=False): This will return a method descriptor showing who called it, what arguements and what it returned :

{
    "name" : "func_a"
    "inbound" : [
        {
            "id" : "2.43.23.4"
            "from_id" : "321#func_c"
            "local_vars": {
                "x": "5"
             },
            "return" : { "12" }
        },
    ]
}

get_method_outbound_calls(method_id, unique=False): This will return all of the outbound calls made by this method :

{
    "name" : "func_a"
    "outbound" : [
        {
            "id" : "2.43.23.4"
            "to_id" : "543#func_c"
            "local_vars": {
                "x": "5"
             },
            "return" : { "12" }
        },
    ]
}

Pind Event Trace

This will generate a complete Plant UML Event Trace diagram showing each step.

A long term goal would to feed this information into AutoComplete.

Setup

To use PIND, you'll need to setup a Python virtual environment and install the required packages. Follow the steps below:

Usage

python pind/pind.py tests/nested.py 
Trace output saved to .trace_dump/nested_20231128_095452_trace.json

python pind/pind_normalise.py  .trace_dump/nested_20231128_095452_trace.json .trace_dump/normalised.json

Nested trace output saved to .trace_dump/normalised.json

How it works

Stream Capture

The following is an example of the event stream:

     {
        "event": "line",
        "filename": "nested.py",
        "lineno": 6,
        "function": "func_A",
        "code_context": "b = func_B(x)",
        "local_vars": null,
        "return_value": null
    },
    {
        "event": "call",
        "filename": "nested.py",
        "lineno": 10,
        "function": "func_B",
        "code_context": "def func_B(x):",
        "local_vars": {
            "x": "5"
        },
        "return_value": null
    },

Frame Numbering

To uniquely identify each frame in each run each it is unique idnetifier:

[run_id].[step_count].[step_count].....

run_id: this is the last 4 char in the encoded start time
step_count: the number of steps taken at that stack depth

This will allow the system to quickly traverse up and down the stack. It is also incremental so you can easily tell for any two frames which came first which will be essential in the object history

This will allow the system to quickly travers up and down the stack

main          [gB2z.21]
│
└── func_a(4) [gB2z.22]
     │
     ├─── func_a(x=4)   [gB2z.22.1]
     │    │
     │    ├── x+=3      [gB2z.22.2]
     │    │
     │    └── func_b(x) [gB2z.22.3]
     │         │
     │         └─── func_b(y=7)  [gB2z.22.3.1]

Object Versioning

Objects can be large and complex. It try and optimize their storage only changes will be recorded Objects will only be checked for changes when then are passed into local methods. Though this may still prove to be expensive and an explicit skip list may be generated to optimized this in the future.

"local_vars": {
    "ai_tool": "<__main__.OpenAITools object at 0x108296a40>"
},

Object stream storage:

[
{
    "id":"OpenAITools.23"
    "versions":[
        {
            "frame_id" : "gB2z.22.3.1",
            "fields": [
                {
                    "name": "f1"
                    "type": "int"
                    "value": "4"
                },
                {
                    "name": "my_obj"
                    "type": "Object"
                    "value": "OpenAIApi.42"
                }
            ]
        }
    ]    
},

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pind		pind
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Interaction Dump (PIND) Project

Problem Statement

Solution

Features

Stream Capture

Stream Reporting

Pind Context Cache

Pind Event Trace

Setup

Usage

How it works

Stream Capture

Frame Numbering

Object Versioning

About

Releases

Packages

Languages

License

birdperson1970/pind

Folders and files

Latest commit

History

Repository files navigation

Python Interaction Dump (PIND) Project

Problem Statement

Solution

Features

Stream Capture

Stream Reporting

Pind Context Cache

Pind Event Trace

Setup

Usage

How it works

Stream Capture

Frame Numbering

Object Versioning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages