Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

jkbrooks · 2025-01-07T17:28:46Z

As RSP team, we want to have deeper visibility in Context Construction via providers, so that we can understand how key details (Recent Messages, User Context, Relevant Facts) are constructed for debugging and context construction optimization.

Acceptance Criteria:

ensure data is parsable and structured
a way to log events, and stream them somewhere
logging prompts sent to LLMs
can pipe the output to other sources (data dog, etc) for data analysis
Detailed logs are generated for each step of the context composition process.
Logs include all relevant data, including provider outputs, intermediate results, and final state.
Logs are stored in a place that can be accessed by agents and team members easily (in JSON or in a log file or in db)
target user is understanding prompt and LLM output for prompt-engineering and A/B prompt testing

We want to log and review these as it relates to constructing context

const comprehensiveProvider: Provider = {
    get: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
        try {
            // Get recent messages
            const messages = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                count: 5,
            });

            // Get user context
            const userContext = await runtime.descriptionManager.getMemories({
                roomId: message.roomId,
                userId: message.userId,
            });

            // Get relevant facts
            const facts = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                tableName: "facts",
                count: 3,
            });

            // Format comprehensive context
            return `


${messages.map((m) => `- ${m.content.text}`).join("\n")}

${userContext.map((c) => c.content.text).join("\n")}

${facts.map((f) => `- ${f.content.text}`).join("\n")}
      `.trim();
        } catch (error) {
            console.error("Provider error:", error);
            return "Context temporarily unavailable";
        }
    },
};

The text was updated successfully, but these errors were encountered:

jkbrooks · 2025-01-07T17:29:12Z

I would care substantially about where these logs would be stored and where they can be accessed, and am extremely interested in agents being able to access all logs.

jzvikart · 2025-01-09T18:54:33Z

@monilpat What branch this should target?

ArsalonAmini2024 · 2025-01-10T17:39:41Z

@monilpat What branch this should target?

each issue should have a new branch created and after it's been completed and then you can make a pull request to the next env (dev ENV?) and after that then to main branch?

@monilpat what's your thoughts ^^

monilpat · 2025-01-10T17:49:45Z

The target should always be sif-dev and a new branch per issue makes sense

…

On Jan 10, 2025 at 09:40 -0800, Arsalon ***@***.***>, wrote: > @monilpat What branch this should target? each issue should have a new branch created and after it's been completed and then you can make a pull request to the next env (dev ENV?) and after that then to main branch? @monilpat what's your thoughts ^^ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

ArsalonAmini2024 · 2025-01-13T17:18:07Z

@monilpat we discussed logging and some thoughts came up.

Where to store them Pubsub vs. Postgress - @jzvikart recommends we use postgress
Add to RAG system - allow the agent to access these logs using RAG so that it can get some information about it's own logs

More to be discussed.

ArsalonAmini2024 · 2025-01-14T16:26:41Z

Adding notes from our call 1/13/24 w/ Jure and Monil
@monilpat @jzvikart
@jkbrooks adding you here for visibility

Need to be able to Query effectively
Need to "Define a run" and different logs that make up a run
Need to store data and query it, will need a DB even in simpliest
we have Eliza logger, can write conditional logic to write to DB that is defined, what is logging (.error, .debug, etc)

Define the DB architecture

Jure says a simple postgres, says Eliza logging internally is for users, what we want to do is instrumentation, would be there but default inactive unless set env variables, decide what is a run, an instance of agent, one instance of memory/context
thinking more of relational way (Monil)
a run table that has with a 1-to-n relationship (easier to query)
a run has ID, agent, specific action being done (it's own table) in future more info like swarm_id
for a run, can grab whatever events
a scenario table
includes a event (type of event)
includes output (output we are logging)
Monil thinks it should be in the same infra as Eliza, no reason to seclude it, very easy to add a table and use underlying methods they have to query it, recommends creating it on the postgres adapter to start
Jure thought what Eliza uses internally has nothing to do with this, how much sense to use the same DB?

Run (Definition)

an instance of an agent can have many runs, doing multiple different workflows and actions
ex. an agent is reviewing a PR -> query current state, generate dynamic template of PR info to review, making LLM call, getting response, parsing response, calling associated github API to create a review comment, returning success/failure is a single run (an iteration through the loop)
a unit of execution, start pulling data from various sources, 1 or more LLM queries, return a result -> generic steps to doing this, there is a structured flow almost always
Categories and info to log (every single time, this makes up a run)
state before doing anything
interpolated prompt
output of the LLM
any action taken and output of the action

Query and UI

want to query for a run, what did LLM say?
for a run, what did we pass into it?
this makes useful input/output for prompt engineer
how to query? a non-tech user will not use SQL? how to visualize?
prompt and LLM output for prompt-engineer
small script that gets this info in DB
a non tech person will use our UI, agent chat UI, some ability to review the logs, be able to click around, for this agent, want latest run for this X action
could create a tiny app - enter run_id into text box, submit action, prints all records and events for the run in execution order
can edit prompt in character file, can see input/see output and update character file with template

jzvikart · 2025-01-16T06:34:49Z

After some initial research, the approach that @monilpat suggested appears to have several drawbacks. In particular, if we use logging with existing PostgreSQL adapter:

Anybody who wants to use logging is forced to use PostgreSQL also for other stuff (memories etc.) This introduces a new coupling constraint that in many scenarios might not be desirable.
Using PostgreSQL adapter requires installation of 3rd party extensions (e.g. vector) which are typically not part of distribution packages and generally need to be built from sources. This introduces additional complexity for deployment and maintenance, and reduces efficiency of development process. It also places limitations on the availability of easy-to-use solutions such as docker containers or hosted database instances.
The storage/processing requirements of logging might bloat the requiremrnts for Eliza database itself, resulting in lower operational performance and higher system requirements that could otherwise be avoided.
Considering above issues together it means that for this ticket there should be a prior decision and planning for infrastructure for hosting the DB, as well as a separate ticket for deploying it. In this case it would be best to start with PostgreSQL deployment, because we will already need it during development for this ticket and having one ready would avoid duplicate work. As already suggested, at this point we can also still decide to decouple the databases and use a separate instance for logging (without the need for extensions), or write logs to a text file.

@ArsalonAmini2024 @jkbrooks

ArsalonAmini2024 · 2025-01-16T07:05:13Z

@jzvikart @monilpat @jkbrooks I created this ADR for the feature - we may have jumped in quickly and skipped this technical scoping step. Let's fill this out, take a step back and ensure we're all on the same page with the ADR (architectural decision record) - https://docs.google.com/document/d/11CB3FyorvSxPxqbO4P35wTNuHJ-EDD2rKBEUiyO-ngc/edit?usp=sharing

jzvikart · 2025-01-20T05:30:19Z

What is the scenario that we want to instrument?
Steps to reproduce? (command to start, env settings, character files, prompts etc.)

jzvikart · 2025-01-21T04:39:45Z

Implementation is now working, the recommended next steps are:

pair up with the person who's going to be analyzing the data
decide on a particular scenario (see above) and set up a testing system
start analyzing the data and add/refine tracepoints to capture information that is needed
develop tools for analyzing and processing the trace data

Collection and refinement of trace data should be done selectively and iteratively. Capturing and analyzing "everything" is not realistic.

To kick this off I recommend doing a demo, or a pair coding session.

ArsalonAmini2024 · 2025-01-21T04:49:14Z

@jzvikart thanks for the update. A few PM comments -

I don't see a pull request for this feature. Is this PR in review now?
Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

did you utilize the adapter-postgres and create additional table in the DB?
is this live in the test ENV running on an instance of PROSPER?
is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

@monilpat can you review this PR and give feedback

jzvikart · 2025-01-21T04:56:25Z

1. I don't see a pull request for this feature. Is this PR in review now?

I did not create a PR yet since it would make sense to answer some of the questions first.

2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.

* did you utilize the adapter-postgres and create additional table in the DB?

Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.

* is this live in the test ENV running on an instance of PROSPER?

No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.

* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

No, the current interface is SQL, any additional tools need to be discussed and developed.

jzvikart · 2025-01-21T05:00:03Z

One more thing: running and building is still failing non-deterministically. I've tried 3 different branches already and verified that the problem exists in version prior to my changes. We should address this. I've been in contact with Caner, but so far there is no known cause or fix.

monilpat · 2025-01-21T05:09:25Z

Hey, thanks so much for doing this in terms of the bill issues. It's something that the V2 separation into community plug-ins is gonna solve so it's a separate repository. Note with the way it currently works you will need to run it multiple times for it to successfully build and if it still fails, you'll need to comment out the blame plug-ins. We need to address this as long as your plug-in is being built you are not blocked by this so if you read the logs, you can see if your plug-in has been built or not

monilpat · 2025-01-21T05:11:51Z

1. I don't see a pull request for this feature. Is this PR in review now?
I did not create a PR yet since it would make sense to answer some of the questions first.
2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development
I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.
* did you utilize the adapter-postgres and create additional table in the DB?
Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.
* is this live in the test ENV running on an instance of PROSPER?
No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.
* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)
No, the current interface is SQL, any additional tools need to be discussed and developed.

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

jzvikart · 2025-01-21T05:12:28Z

@monilpat Thanks for explanation, that's exactly what I've been doing. If it's a known issue that's being worked on that's enough for me.

jzvikart · 2025-01-21T05:21:55Z

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

OK, I'll create a draft PR so that we can continue the discussion there. As for tools - everything is possible, but we need to decide on the right approach first, considering the tradeoffs and skills of the person who will be doing this. I think more than UI/dropdowns we will need some data analysis tools, scripting, etc. And if we do go into UI, it should definitely be separate from Eliza main UI.

monilpat · 2025-01-21T05:22:55Z

Happy to review the PR when it available thanks for flagging!

…

On Jan 20, 2025 at 20:49 -0800, Arsalon ***@***.***>, wrote: @jzvikart thanks for the update. A few PM comments - 1. > I don't see a pull request for this feature. Is this PR in review now? 2. > Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development • did you utilize the adapter-postgres and create additional table in the DB? • is this live in the test ENV running on an instance of PROSPER? • is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?) @monilpat can you review this PR and give feedback — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

jzvikart · 2025-01-21T05:41:26Z

#275

TimKozak · 2025-01-23T18:06:11Z

@jzvikart @monilpat (tagging you on behalf of Ars) - how's everything going with this feature?

jzvikart · 2025-01-23T18:31:48Z

@TimKozak Tracing framework is implemented and works. To meaningfully continue this ticket, we would need a "customer" who will be analysing the data/prompts and one or more use cases. When we know who the "customer" is I can provide engineering support and everything that's needed. See my comments above.

jzvikart · 2025-01-23T18:43:31Z

Next steps that we discussed so far:

Record a video
Trace a random scenario with unknown tracing criteria

monilpat · 2025-01-23T19:18:54Z

Yes, I spoke to Juri. We just finished up a meeting. I effectively unblocked him for what we need to get done for the scope of this ticket. There's a broader question that Juri has about how we're gonna use this but I think he is now unblocked I'm hoping he's able to implement the four calls to the trace method for each of the actions under plug-in GitHub and plug in Coinbase

…

On Thu, Jan 23, 2025 at 10:43 AM jzvikart ***@***.***> wrote: Next steps that we discussed so far: - Record a video - Trace a random scenario with unknown tracing criteria — Reply to this email directly, view it on GitHub <#196 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADS6ROIIEB5S4UWOI4TXSXT2MEZ6TAVCNFSM6AAAAABUYFPN2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJQGY4TMNZWG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

monilpat · 2025-01-23T19:19:42Z

I'm not sure what his band with is, but I imagine he should be able to do it within the next day or two as it's pretty much copy paste once he gets one working and I explained to him how to test it

…

On Thu, Jan 23, 2025 at 11:18 AM Monil Patel ***@***.***> wrote: Yes, I spoke to Juri. We just finished up a meeting. I effectively unblocked him for what we need to get done for the scope of this ticket. There's a broader question that Juri has about how we're gonna use this but I think he is now unblocked I'm hoping he's able to implement the four calls to the trace method for each of the actions under plug-in GitHub and plug in Coinbase On Thu, Jan 23, 2025 at 10:43 AM jzvikart ***@***.***> wrote: > Next steps that we discussed so far: > > - Record a video > - Trace a random scenario with unknown tracing criteria > > — > Reply to this email directly, view it on GitHub > <#196 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADS6ROIIEB5S4UWOI4TXSXT2MEZ6TAVCNFSM6AAAAABUYFPN2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJQGY4TMNZWG4> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

jzvikart · 2025-01-24T17:12:30Z

Note to self:

Implement instrumentation according to Monil's suggestions as a baseline.
Take a simple scenario such as Coinbase "create charge" (take something from example), or a simple generic character (Trump).
Capture traces
Make some screenshots, video, or export the data for review, analysis and further discussion.

This would wrap up this ticket.

ArsalonAmini2024 · 2025-01-24T18:14:54Z

@jzvikart the above sounds good. The ultimate use case I want to be able to do is this.

Have an API endpoint in which I can GET the traces for runs (add some pagination, optional query params like run ID, agent name, date range).

The endpoint will be consumed by a frontend (Swagger UI is fine) and displayed. If we don't have Swagger implemented in the codebase add the config please and host on a non local URL (public URL) we can all access.

with the swagger UI i can send in params like agent name, date range, etc. and get back the runs as a response

I can then look into the response for various details like what the prompt was, the action, etc.

it will also help me to SEE and understand the implementation so i can give feedback on additional things we want to include

If you can get the following done, we can consider this completed:

Implement the logic to log the various components of a run in a well defined table in some DB (your choice if separate DB instance or same DB instance as the other info for an instantiated agent running on the server).
Write tests to confirm the function, class, logic returns what we want and appends to the Db appropriately
Expose this via an API endpoint so we can query it on Swagger UI
Add to the Swagger documentation so I can play around with it

I think if we have a Swagger UI that I can play around with this endpoint that will be good enough here.

ArsalonAmini2024 · 2025-01-24T18:23:11Z

FYI - moving this into two separate tickets

Implement Swagger UI on a public URL for the Dev ENV and
Implement a RESTFUL API (GET) for this data (to display on SwaggerUI).

ArsalonAmini2024 · 2025-01-24T18:28:54Z

Additional discussion on Docker and vector extensions -

ArsalonAmini2024 · 2025-01-24T20:01:33Z

@jzvikart once the PR is merged, I will close this out.

ArsalonAmini2024 assigned jzvikart Jan 7, 2025

ArsalonAmini2024 changed the title ~~Provider Context Logging For Enhanced Understanding of Provider Context Composition~~ Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition Jan 9, 2025

ArsalonAmini2024 changed the title ~~Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition~~ Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition Jan 9, 2025

ArsalonAmini2024 mentioned this issue Jan 22, 2025

Agent State Logging for Enhanced Understanding of State Composition #197

Closed

This was referenced Jan 23, 2025

Prompt Logging | Agent Generate Post Prompt #208

Closed

Prompt Logging | Action Prompt #207

Closed

ArsalonAmini2024 mentioned this issue Jan 23, 2025

Prompt Logging | Agent Response Generation Prompt #209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

jkbrooks commented Jan 7, 2025 •

edited by ArsalonAmini2024

Loading

jkbrooks commented Jan 7, 2025 •

edited

Loading

jzvikart commented Jan 9, 2025

ArsalonAmini2024 commented Jan 10, 2025

monilpat commented Jan 10, 2025 via email

ArsalonAmini2024 commented Jan 13, 2025

ArsalonAmini2024 commented Jan 14, 2025 •

edited

Loading

jzvikart commented Jan 16, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 16, 2025

jzvikart commented Jan 20, 2025 •

edited

Loading

jzvikart commented Jan 21, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 21, 2025

jzvikart commented Jan 21, 2025

jzvikart commented Jan 21, 2025

monilpat commented Jan 21, 2025

monilpat commented Jan 21, 2025

jzvikart commented Jan 21, 2025

jzvikart commented Jan 21, 2025

monilpat commented Jan 21, 2025 via email

jzvikart commented Jan 21, 2025

TimKozak commented Jan 23, 2025

jzvikart commented Jan 23, 2025 •

edited

Loading

jzvikart commented Jan 23, 2025

monilpat commented Jan 23, 2025 via email

monilpat commented Jan 23, 2025 via email

jzvikart commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

Comments

jkbrooks commented Jan 7, 2025 • edited by ArsalonAmini2024 Loading

jkbrooks commented Jan 7, 2025 • edited Loading

jzvikart commented Jan 9, 2025

ArsalonAmini2024 commented Jan 10, 2025

monilpat commented Jan 10, 2025 via email

ArsalonAmini2024 commented Jan 13, 2025

ArsalonAmini2024 commented Jan 14, 2025 • edited Loading

jzvikart commented Jan 16, 2025 • edited Loading

ArsalonAmini2024 commented Jan 16, 2025

jzvikart commented Jan 20, 2025 • edited Loading

jzvikart commented Jan 21, 2025 • edited Loading

ArsalonAmini2024 commented Jan 21, 2025

jzvikart commented Jan 21, 2025

jzvikart commented Jan 21, 2025

monilpat commented Jan 21, 2025

monilpat commented Jan 21, 2025

jzvikart commented Jan 21, 2025

jzvikart commented Jan 21, 2025

monilpat commented Jan 21, 2025 via email

jzvikart commented Jan 21, 2025

TimKozak commented Jan 23, 2025

jzvikart commented Jan 23, 2025 • edited Loading

jzvikart commented Jan 23, 2025

monilpat commented Jan 23, 2025 via email

monilpat commented Jan 23, 2025 via email

jzvikart commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025 • edited Loading

ArsalonAmini2024 commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025

ArsalonAmini2024 commented Jan 24, 2025

jkbrooks commented Jan 7, 2025 •

edited by ArsalonAmini2024

Loading

jkbrooks commented Jan 7, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 14, 2025 •

edited

Loading

jzvikart commented Jan 16, 2025 •

edited

Loading

jzvikart commented Jan 20, 2025 •

edited

Loading

jzvikart commented Jan 21, 2025 •

edited

Loading

jzvikart commented Jan 23, 2025 •

edited

Loading

ArsalonAmini2024 commented Jan 24, 2025 •

edited

Loading