Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition #196

Open
jkbrooks opened this issue Jan 7, 2025 · 29 comments
Assignees

Comments

@jkbrooks
Copy link

jkbrooks commented Jan 7, 2025

As RSP team, we want to have deeper visibility in Context Construction via providers, so that we can understand how key details (Recent Messages, User Context, Relevant Facts) are constructed for debugging and context construction optimization.

Acceptance Criteria:

  • ensure data is parsable and structured
  • a way to log events, and stream them somewhere
  • logging prompts sent to LLMs
  • can pipe the output to other sources (data dog, etc) for data analysis
  • Detailed logs are generated for each step of the context composition process.
  • Logs include all relevant data, including provider outputs, intermediate results, and final state.
  • Logs are stored in a place that can be accessed by agents and team members easily (in JSON or in a log file or in db)
  • target user is understanding prompt and LLM output for prompt-engineering and A/B prompt testing

We want to log and review these as it relates to constructing context

const comprehensiveProvider: Provider = {
    get: async (runtime: IAgentRuntime, message: Memory, state?: State) => {
        try {
            // Get recent messages
            const messages = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                count: 5,
            });

            // Get user context
            const userContext = await runtime.descriptionManager.getMemories({
                roomId: message.roomId,
                userId: message.userId,
            });

            // Get relevant facts
            const facts = await runtime.messageManager.getMemories({
                roomId: message.roomId,
                tableName: "facts",
                count: 3,
            });

            // Format comprehensive context
            return `


${messages.map((m) => `- ${m.content.text}`).join("\n")}

${userContext.map((c) => c.content.text).join("\n")}

${facts.map((f) => `- ${f.content.text}`).join("\n")}
      `.trim();
        } catch (error) {
            console.error("Provider error:", error);
            return "Context temporarily unavailable";
        }
    },
};

@jkbrooks
Copy link
Author

jkbrooks commented Jan 7, 2025

I would care substantially about where these logs would be stored and where they can be accessed, and am extremely interested in agents being able to access all logs.

@ArsalonAmini2024 ArsalonAmini2024 changed the title Provider Context Logging For Enhanced Understanding of Provider Context Composition Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition Jan 9, 2025
@jzvikart
Copy link
Collaborator

jzvikart commented Jan 9, 2025

@monilpat What branch this should target?

@ArsalonAmini2024 ArsalonAmini2024 changed the title Logging Instrumentation | Provider Context Logging For Enhanced Understanding of Provider Context Composition Logging Instrumentation | Context & Prompt Logging Infra For Enhanced Understanding of Context Composition Jan 9, 2025
@ArsalonAmini2024
Copy link
Collaborator

@monilpat What branch this should target?

each issue should have a new branch created and after it's been completed and then you can make a pull request to the next env (dev ENV?) and after that then to main branch?

@monilpat what's your thoughts ^^

@monilpat
Copy link
Collaborator

monilpat commented Jan 10, 2025 via email

@ArsalonAmini2024
Copy link
Collaborator

@monilpat we discussed logging and some thoughts came up.

  1. Where to store them Pubsub vs. Postgress - @jzvikart recommends we use postgress
  2. Add to RAG system - allow the agent to access these logs using RAG so that it can get some information about it's own logs

More to be discussed.

@ArsalonAmini2024
Copy link
Collaborator

ArsalonAmini2024 commented Jan 14, 2025

Adding notes from our call 1/13/24 w/ Jure and Monil
@monilpat @jzvikart
@jkbrooks adding you here for visibility

  • Need to be able to Query effectively
  • Need to "Define a run" and different logs that make up a run
  • Need to store data and query it, will need a DB even in simpliest
    we have Eliza logger, can write conditional logic to write to DB that is defined, what is logging (.error, .debug, etc)

Define the DB architecture

  • Jure says a simple postgres, says Eliza logging internally is for users, what we want to do is instrumentation, would be there but default inactive unless set env variables, decide what is a run, an instance of agent, one instance of memory/context
  • thinking more of relational way (Monil)
  • a run table that has with a 1-to-n relationship (easier to query)
  • a run has ID, agent, specific action being done (it's own table) in future more info like swarm_id
    for a run, can grab whatever events
  • a scenario table
  • includes a event (type of event)
  • includes output (output we are logging)
  • Monil thinks it should be in the same infra as Eliza, no reason to seclude it, very easy to add a table and use underlying methods they have to query it, recommends creating it on the postgres adapter to start
  • Jure thought what Eliza uses internally has nothing to do with this, how much sense to use the same DB?

Run (Definition)

  • an instance of an agent can have many runs, doing multiple different workflows and actions
    ex. an agent is reviewing a PR -> query current state, generate dynamic template of PR info to review, making LLM call, getting response, parsing response, calling associated github API to create a review comment, returning success/failure is a single run (an iteration through the loop)

  • a unit of execution, start pulling data from various sources, 1 or more LLM queries, return a result -> generic steps to doing this, there is a structured flow almost always

  • Categories and info to log (every single time, this makes up a run)

  • state before doing anything

  • interpolated prompt

  • output of the LLM

  • any action taken and output of the action

Query and UI

  • want to query for a run, what did LLM say?
  • for a run, what did we pass into it?
    this makes useful input/output for prompt engineer
  • how to query? a non-tech user will not use SQL? how to visualize?
  • prompt and LLM output for prompt-engineer
  • small script that gets this info in DB
    a non tech person will use our UI, agent chat UI, some ability to review the logs, be able to click around, for this agent, want latest run for this X action
  • could create a tiny app - enter run_id into text box, submit action, prints all records and events for the run in execution order
  • can edit prompt in character file, can see input/see output and update character file with template

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 16, 2025

After some initial research, the approach that @monilpat suggested appears to have several drawbacks. In particular, if we use logging with existing PostgreSQL adapter:

  1. Anybody who wants to use logging is forced to use PostgreSQL also for other stuff (memories etc.) This introduces a new coupling constraint that in many scenarios might not be desirable.
  2. Using PostgreSQL adapter requires installation of 3rd party extensions (e.g. vector) which are typically not part of distribution packages and generally need to be built from sources. This introduces additional complexity for deployment and maintenance, and reduces efficiency of development process. It also places limitations on the availability of easy-to-use solutions such as docker containers or hosted database instances.
  3. The storage/processing requirements of logging might bloat the requiremrnts for Eliza database itself, resulting in lower operational performance and higher system requirements that could otherwise be avoided.
  4. Considering above issues together it means that for this ticket there should be a prior decision and planning for infrastructure for hosting the DB, as well as a separate ticket for deploying it. In this case it would be best to start with PostgreSQL deployment, because we will already need it during development for this ticket and having one ready would avoid duplicate work. As already suggested, at this point we can also still decide to decouple the databases and use a separate instance for logging (without the need for extensions), or write logs to a text file.

@ArsalonAmini2024 @jkbrooks

@ArsalonAmini2024
Copy link
Collaborator

@jzvikart @monilpat @jkbrooks I created this ADR for the feature - we may have jumped in quickly and skipped this technical scoping step. Let's fill this out, take a step back and ensure we're all on the same page with the ADR (architectural decision record) - https://docs.google.com/document/d/11CB3FyorvSxPxqbO4P35wTNuHJ-EDD2rKBEUiyO-ngc/edit?usp=sharing

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 20, 2025

What is the scenario that we want to instrument?
Steps to reproduce? (command to start, env settings, character files, prompts etc.)

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 21, 2025

Implementation is now working, the recommended next steps are:

  • pair up with the person who's going to be analyzing the data
  • decide on a particular scenario (see above) and set up a testing system
  • start analyzing the data and add/refine tracepoints to capture information that is needed
  • develop tools for analyzing and processing the trace data

Collection and refinement of trace data should be done selectively and iteratively. Capturing and analyzing "everything" is not realistic.

To kick this off I recommend doing a demo, or a pair coding session.

@ArsalonAmini2024
Copy link
Collaborator

@jzvikart thanks for the update. A few PM comments -

  1. I don't see a pull request for this feature. Is this PR in review now?

  2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

  • did you utilize the adapter-postgres and create additional table in the DB?
  • is this live in the test ENV running on an instance of PROSPER?
  • is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

@monilpat can you review this PR and give feedback

@jzvikart
Copy link
Collaborator

1. I don't see a pull request for this feature. Is this PR in review now?

I did not create a PR yet since it would make sense to answer some of the questions first.

2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.

* did you utilize the adapter-postgres and create additional table in the DB?

Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.

* is this live in the test ENV running on an instance of PROSPER?

No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.

* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

No, the current interface is SQL, any additional tools need to be discussed and developed.

@jzvikart
Copy link
Collaborator

One more thing: running and building is still failing non-deterministically. I've tried 3 different branches already and verified that the problem exists in version prior to my changes. We should address this. I've been in contact with Caner, but so far there is no known cause or fix.

@monilpat
Copy link
Collaborator

Hey, thanks so much for doing this in terms of the bill issues. It's something that the V2 separation into community plug-ins is gonna solve so it's a separate repository. Note with the way it currently works you will need to run it multiple times for it to successfully build and if it still fails, you'll need to comment out the blame plug-ins. We need to address this as long as your plug-in is being built you are not blocked by this so if you read the logs, you can see if your plug-in has been built or not

@monilpat
Copy link
Collaborator

1. I don't see a pull request for this feature. Is this PR in review now?

I did not create a PR yet since it would make sense to answer some of the questions first.

2. Can you comment on the solution approach or attach a Loom video so that our QA team can understand / begin manual testing scenario development

I cannot make a video, but I am happy to pair up and discuss whatever the person who will use this wants to know. Testing/QA does not make sense here.

* did you utilize the adapter-postgres and create additional table in the DB?

Yes, as Monil suggested, although it is now clear that it would be better to separate the trace data in a separate DB instance. We can still change that though.

* is this live in the test ENV running on an instance of PROSPER?

No, this is currently only on development machine due to reasons and limitations that I mentioned, and we should make a plan how/where to deploy it. See above.

* is there a simple UI client (did you create a new React client UI to visualize the run data, a simple table or a filter by RUN ID?)

No, the current interface is SQL, any additional tools need to be discussed and developed.

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

@jzvikart
Copy link
Collaborator

@monilpat Thanks for explanation, that's exactly what I've been doing. If it's a known issue that's being worked on that's enough for me.

@jzvikart
Copy link
Collaborator

Yeah, that makes sense regarding the PR note. We preferred to have draft PR's when possible for review yeah I think our arsalon would probably be the best person to pair up with at this point. And I'm happy to hop in as needed yeah that can be a fast fallout to separate it as it makes sense but right now getting something working is very important. Yeah, I think Ars and I were talking about a simple UI that is part of the Eliza chat UI that for a conversation shows the runs and then when you click on the run or select the run from a drop-down, it will show you all the associated logs

OK, I'll create a draft PR so that we can continue the discussion there. As for tools - everything is possible, but we need to decide on the right approach first, considering the tradeoffs and skills of the person who will be doing this. I think more than UI/dropdowns we will need some data analysis tools, scripting, etc. And if we do go into UI, it should definitely be separate from Eliza main UI.

@monilpat
Copy link
Collaborator

monilpat commented Jan 21, 2025 via email

@jzvikart
Copy link
Collaborator

#275

@TimKozak
Copy link
Collaborator

@jzvikart @monilpat (tagging you on behalf of Ars) - how's everything going with this feature?

@jzvikart
Copy link
Collaborator

jzvikart commented Jan 23, 2025

@TimKozak Tracing framework is implemented and works. To meaningfully continue this ticket, we would need a "customer" who will be analysing the data/prompts and one or more use cases. When we know who the "customer" is I can provide engineering support and everything that's needed. See my comments above.

@jzvikart
Copy link
Collaborator

Next steps that we discussed so far:

  • Record a video
  • Trace a random scenario with unknown tracing criteria

@monilpat
Copy link
Collaborator

monilpat commented Jan 23, 2025 via email

@monilpat
Copy link
Collaborator

monilpat commented Jan 23, 2025 via email

@jzvikart
Copy link
Collaborator

Note to self:

  1. Implement instrumentation according to Monil's suggestions as a baseline.
  2. Take a simple scenario such as Coinbase "create charge" (take something from example), or a simple generic character (Trump).
  3. Capture traces
  4. Make some screenshots, video, or export the data for review, analysis and further discussion.

This would wrap up this ticket.

@ArsalonAmini2024
Copy link
Collaborator

ArsalonAmini2024 commented Jan 24, 2025

@jzvikart the above sounds good. The ultimate use case I want to be able to do is this.

Have an API endpoint in which I can GET the traces for runs (add some pagination, optional query params like run ID, agent name, date range).

The endpoint will be consumed by a frontend (Swagger UI is fine) and displayed. If we don't have Swagger implemented in the codebase add the config please and host on a non local URL (public URL) we can all access.

with the swagger UI i can send in params like agent name, date range, etc. and get back the runs as a response

I can then look into the response for various details like what the prompt was, the action, etc.

it will also help me to SEE and understand the implementation so i can give feedback on additional things we want to include

If you can get the following done, we can consider this completed:

  1. Implement the logic to log the various components of a run in a well defined table in some DB (your choice if separate DB instance or same DB instance as the other info for an instantiated agent running on the server).
  2. Write tests to confirm the function, class, logic returns what we want and appends to the Db appropriately
  3. Expose this via an API endpoint so we can query it on Swagger UI
  4. Add to the Swagger documentation so I can play around with it

I think if we have a Swagger UI that I can play around with this endpoint that will be good enough here.

@ArsalonAmini2024
Copy link
Collaborator

FYI - moving this into two separate tickets

  1. Implement Swagger UI on a public URL for the Dev ENV and
  2. Implement a RESTFUL API (GET) for this data (to display on SwaggerUI).
Image

@ArsalonAmini2024
Copy link
Collaborator

Additional discussion on Docker and vector extensions -

Image

@ArsalonAmini2024
Copy link
Collaborator

@jzvikart once the PR is merged, I will close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants