Python's Dynamic Typing makes it easy to write flexible code but it make it hard for developers to understand exactly what is going on. One of the approaches take to determine the actual arguments is run the code in the PDB, (Python Debugger). With the introduction of LLM assitants, understanding Python is difficult because there is no easy way for them to understand the context of a method.
Pind goal is to give Users (LLMs / Developers) context by capturing each debug Step/Frame in a Python Application and then presenting it back in a clear and understandable format. Pind is broken up into two parts:
- Stream Capture: Pind attaches to PDB and steps though each Step/Frame.
- Stream Report: A reporting module ingests the Stream and output the relivant information based on the User's needs.
With this detailed context LLMS will be able to generate:
- Detailed Event Trace diagrams of the entire system
- Generate Python unit tests with Mocks
- Improve code being generated by LLMs
- Capture all of the events for a PDB run from the commandline
python pind/pind.py tests/nested.py
Trace output saved to .trace_dump/nested_20231128_095452_trace.json
- Trigger pind from a PDB session. By calling
from pind import pind
pind.Pind("stream.json").run_till_break()
This will ingest the entire stream and build an in memory. You can then call methods to gain contextual information about project
- find_method(str method_name)-> List(method_ids): This will match any text to a list of all methods
- get_method_inbound_calls(method_id, unique=False): This will return a method descriptor showing who called it, what arguements and what it returned :
{
"name" : "func_a"
"inbound" : [
{
"id" : "2.43.23.4"
"from_id" : "321#func_c"
"local_vars": {
"x": "5"
},
"return" : { "12" }
},
]
}
- get_method_outbound_calls(method_id, unique=False): This will return all of the outbound calls made by this method :
{
"name" : "func_a"
"outbound" : [
{
"id" : "2.43.23.4"
"to_id" : "543#func_c"
"local_vars": {
"x": "5"
},
"return" : { "12" }
},
]
}
This will generate a complete Plant UML Event Trace diagram showing each step.
A long term goal would to feed this information into AutoComplete.
To use PIND, you'll need to setup a Python virtual environment and install the required packages. Follow the steps below:
python pind/pind.py tests/nested.py
Trace output saved to .trace_dump/nested_20231128_095452_trace.json
python pind/pind_normalise.py .trace_dump/nested_20231128_095452_trace.json .trace_dump/normalised.json
Nested trace output saved to .trace_dump/normalised.json
The following is an example of the event stream:
{
"event": "line",
"filename": "nested.py",
"lineno": 6,
"function": "func_A",
"code_context": "b = func_B(x)",
"local_vars": null,
"return_value": null
},
{
"event": "call",
"filename": "nested.py",
"lineno": 10,
"function": "func_B",
"code_context": "def func_B(x):",
"local_vars": {
"x": "5"
},
"return_value": null
},
To uniquely identify each frame in each run each it is unique idnetifier:
[run_id].[step_count].[step_count].....
- run_id: this is the last 4 char in the encoded start time
- step_count: the number of steps taken at that stack depth
This will allow the system to quickly traverse up and down the stack. It is also incremental so you can easily tell for any two frames which came first which will be essential in the object history
This will allow the system to quickly travers up and down the stack
main [gB2z.21]
│
└── func_a(4) [gB2z.22]
│
├─── func_a(x=4) [gB2z.22.1]
│ │
│ ├── x+=3 [gB2z.22.2]
│ │
│ └── func_b(x) [gB2z.22.3]
│ │
│ └─── func_b(y=7) [gB2z.22.3.1]
Objects can be large and complex. It try and optimize their storage only changes will be recorded Objects will only be checked for changes when then are passed into local methods. Though this may still prove to be expensive and an explicit skip list may be generated to optimized this in the future.
"local_vars": {
"ai_tool": "<__main__.OpenAITools object at 0x108296a40>"
},
Object stream storage:
[
{
"id":"OpenAITools.23"
"versions":[
{
"frame_id" : "gB2z.22.3.1",
"fields": [
{
"name": "f1"
"type": "int"
"value": "4"
},
{
"name": "my_obj"
"type": "Object"
"value": "OpenAIApi.42"
}
]
}
]
},