Philosophy behind usage of flows, prompts, and tools #731

ariel-pettyjohn · 2024-08-01T17:43:50Z

ariel-pettyjohn
Aug 1, 2024

For background context, I'm working on an object-oriented wrapper around Genkit to complement an ODM I built for Firebase/Firestore. My motivation is the result of having seen both senior engineers, often coming from a .NET background, and junior engineers, fresh out of bootcamp, struggle to adapt to the Node/Express/React/Firebase (NERF?) ecosystem because of its lack of familiar structure. So, my design philosophy coming into this is to create libraries that give just enough structure to solve some of the pain points that I see both junior and senior engineers suffer. I don't want to over-engineer abstractions around flows or tools in this case for example, but I do find that developers are often more productive with a dash or pinch of object-orientation in just the right places.

I plan to make the ODM public soon, but, as a design reference, it contains Firebase, FirebaseCollection, and FirebaseDocument classes. It inverts the Firestore hasA Collection hasA Document dependency by injecting a parent collection instance into document instances, and injects a firestore instance into collection instances. This allows users to extend these classes to create custom Collections and Documents in order to model their domain using conventional object-oriented techniques. It also allows for very powerful automatic schema-driven denormalization within the persistence layer itself, but this feature is still under development. The big idea is to create libraries that let both API and front-end developers focus 100% on delivering new features for real-time serverless apps while still using conventional, tried-and-true, OOP practices (instead of spinning their wheels building ad-hoc implementations of things like denormalization for every Firebase call).

So, with that background in mind, I'm curious what the architectural and design philosophy is behind the usage of flows, prompts, and tools?

I'm also curious if there's a specific philosophy behind using a "tool-based architecture" vs. a conventional "service-based" one?

More concretely, since flows are functions, it seems natural to use them to compose the results of multiple prompts; however, in practice, flows seem more like a facility for doing things like testing prompts. That would suggest, since flows consume prompts, that it might be appropriate to invert that dependency and inject flows into prompts with a sole responsibility to that one prompt (and vice versa). That would then leave using tools or custom services to compose results, but I don't want to couple flows/prompts without good reason.

If flows are designed as a facility on a single prompt, then is there a philosophy behind when to use tools to compose results vs. services?

By service I just mean an external class or function responsible for calling multiple flows and resolving promises, each with their own input/output schemata, which then composes their results. This compared to defining multiple tools, each with their own input/output schemata, and then prompting the model to produce one composite response using those tools. I expect making multiple API calls and then composing them is less cost-effective, but I'm currently working on testing the quality of results from each approach out of curiosity.

I want to give the developer as much flexibility as possible, but, from the perspective of feature-development, I've started taking a "prompt-centric" approach. As an API consumer, we just want a black-box oracle that can answer some query, or provide some dynamic result, without having to worry about details like what a flow is. If it's appropriate to invert the dependency between flows and prompts, then I could see myself going to the place of completely subsuming flow management into a prompt class in order to abstract those details from the developer. That would still leave tools (and possibly custom services) for composing prompt results, while streamlining the developer experience.

On the other hand, I could also see myself going to the place of internalizing all "service" logic within flows. This would mean, for example, resolving all promises from child flows and composing the results within a parent "service flow". I want to offer the developer as much flexibility as possible, but it seems there's room for affordance in either one direction or the other. It seems like flows could either become redundant at a high-enough level or themselves take sole responsibility for service logic. It could also just be excessive at this level, but I do find my code tends to produce either "thin" flows or "thin" prompts, which suggests to me that something might be effectively abstracted.

Thanks in advance for any feedback!

ariel-pettyjohn · 2024-08-01T21:21:23Z

ariel-pettyjohn
Aug 1, 2024
Author

To give more context, here's some code I'm using to define a flow to list properties on abstract "objects", e.g. an rgb array property on a color:

const promptTemplateFactory = (input) => {
    return new ListObjectPropertiesTemplate(input);
};

const inputSchemaFactory = (input) => {
    const objectNameDescription
        = new ObjectNameDescriptionTemplate(input);
    const objectNameSchema
        = z.string().describe(objectNameDescription.value);
    return z.object({ objectName: objectNameSchema });
};

const outputSchemaFactory = (input) => {
    const propertyNameDescription
        = new IndefinitePropertyNameDescriptionTemplate(input);
    const objectPropertiesDescription 
        = new ObjectPropertiesDescriptionTemplate(input);
    const propertyNameSchema = z
        .string()
        .describe(propertyNameDescription.value);
    return z.object({
        objectProperties: z
            .array(propertyNameSchema)
            .describe(objectPropertiesDescription.value)
    });
};

const listObjectPropertiesPrompt = new GenkitPrompt({
    promptName: "listObjectPropertiesPrompt",
    promptTemplateFactory,
    inputSchemaFactory,
    outputSchemaFactory
});

export const listObjectPropertiesFlow = new GenkitFlow({
    name  : "listObjectPropertiesFlow",
    prompt: listObjectPropertiesPrompt
});

The Template utility class might get get refactored away in favor of using defineHelper, so we're safe to ignore it for this example. So, the key observation here is that defining the flow essentially comes down to defining factories for the input/output schemata and providing names. If flows should be responsible for coordinating multiple prompts, then this initial design makes sense, I'd just need to change prompt to accept an array of prompts instead. On the other hand, if flows have responsibility to a single prompt, and tools should be solely responsible for composition, then it seems like the GenkitFlow and GenkitPrompt classes could be consolidated.

Of course, those are only two options, and this is still a relatively toy example, but hopefully it's fairly illustrative of the more general questions.

0 replies

MichaelDoyle · 2024-08-09T15:57:02Z

MichaelDoyle
Aug 9, 2024
Collaborator

Hey Ariel,

Looks like we dropped the ball on this. Sorry about that.

Could you say more about the following? How are you using flows to test prompts?

More concretely, since flows are functions, it seems natural to use them to compose the results of multiple prompts; however, in practice, flows seem more like a facility for doing things like testing prompts.

As to your main question:

I'm also curious if there's a specific philosophy behind using a "tool-based architecture" vs. a conventional "service-based" one?

I don't think we'll have a definitive, slam dunk, answer to this question. :) At a meta level, you are describing the continuum we are all on as we move from classic imperatively programmed apps, to hybrid apps that mix imperative instructions with the LLM's natural language "reasoning", all the way to the other side in giving the LLM full control. One of the jobs Genkit has is to help developers make this transition over time.

There are many trade-offs, and these trade-offs will evolve as models improve. For example, LLMs will perform best when you can clearly describe the exact problem to be solved as specifically as you can. The more "tools" you provide to the LLM, the more complex its "reasoning" will become and hallucinations will increase.

Another consideration is at which points you need to go back to the user for confirmation before an action is taken, etc.

I expect making multiple API calls and then composing them is less cost-effective

Keep in mind that tool calling is going to incur multiple API calls. The flow (no pun intended) generally involves a back and forth between the code base and the model. In a typical exchange the model will request the tool call, the application code is responsible for running that request and then calling model again with the results. From there the model will generate the next response, which could be final, or yet another tool request. As the exchange continues, the context gets larger, and the cost of each request will increase.

In general:

The role of flows in your application is to provide a function with some important special sauce (e.g. type safety, streaming and observability)
The role of tools is to provide the LLM with capabilities it natively lacks (e.g. access to real time information from an API call) and provide a bridge back to your application code.

Also keep in mind tool calling is only one workflow choice. RAG is another workflow where you would "prefetch" context from a data store that is relevant to a user query and provide that to the LLM along with the prompt.

What types of abstractions do you have in mind? Eager to hear where you find Genkit's abstractions are lacking and how we might improve. I think I would expect to see abstractions from other frameworks up the stack a bit (e.g. Agents, Chatbots or things like FewShot prompt templates, etc) but I would not necessarily expect that Prompts and Flows need to be re-abstracted.

1 reply

ariel-pettyjohn Aug 9, 2024
Author

Hi there, and thanks so much for your response!

I should probably say "monitoring" rather than "testing". I'm not entirely sure what all happens when a flow gets registered, so this might just be a limitation of my understanding, but I’ve come to primarily think of flows as a way to connect Genkit's developer tools for the purpose of monitoring/analytics.

Could you say more about the following? How are you using flows to test prompts?

Great perspective. My main goal in creating an ODM for Firebase/Firestore, and a corresponding OO wrapper around Genkit, is to create familiar abstractions that encourage developers to use patterns they already know in the right way. In general, I really like working with Firebase products because they make very judicious use of abstraction, and provide interfaces that are just enough, which I think is ideal. I'm definitely not building these libraries because I feel like Firestore/Genkit don't provide enough utility, only because I feel like they're perfect as a foundation for something that might hook OO enthusiasts and also provide training-wheels to junior devs.

At a meta level, you are describing the continuum we are all on as we move from classic imperatively programmed apps, to hybrid apps that mix imperative instructions with the LLM's natural language "reasoning", all the way to the other side in giving the LLM full control. One of the jobs Genkit has is to help developers make this transition over time.

This is really good to know. At first, I imagined tools as being resources that I would hand off to the model to be run entirely on the server in one call, unless there was some need for client involvement.

Keep in mind that tool calling is going to incur multiple API calls.

I have a use-case where I'm identifying the names/types of fields in a form given a form name, then use the field type to determine the corresponding candidate field attributes, which get used to dynamically generate an output schema. The model then picks the attributes from the schema that are appropriate given the form name. You can see a demo of the app here to get a better idea of my motivation: https://bricklin.vercel.app/

I'm currently using a local JSON object to map HTML field types to arrays of their corresponding attributes, but it isn't particularly big, so it wouldn't be unreasonable to include it as a JavaScript object in the body of a tool callback. I didn't think about using a RAG until just now because it's such a small amount of data.

My question in this case is whether using a tool would necessarily add another call when its callback is a pure function, for example, or is this really just a problem for a RAG?

This is a really good way to help distinguish flows from things like tools or services!

The role of flows in your application is to provide a function with some important special sauce (e.g. type safety, streaming and observability)

I think the only abstraction I'm using that's a result of a limitation with Genkit is this idea of a "service", which I initially used as a stopgap until I learned about tools, but have only kept because tools don't support output schema yet: #703. What I call a service is fundamentally just a utility to do things like map/reduce and resolve promises though, and I'm really only using it with Genkit to compose flows as if they're tools for the time being.

Like I said earlier though, I wouldn't change anything fundamentally because I really love building on top of the existing abstractions! 💯 I prefer a fairly opinionated combination of OO + functional programming because I find it gives me a lot of leverage to be more productive, but it doesn't suit everyone. For example, I like being able to conceptually organize and namespace things with classes, but also prefer functional factory semantics. I find this creates a nice division of labor in certain organizations too, where senior backend devs from OO backgrounds can quickly take ownership of the architecture of front-end projects, while junior devs can still copy-paste functional ES6 one-liners from StackOverflow as factories. I also just find it cognitively easer to think in terms of flow.run(input) than runFlow(input) when in comes to implementing new features in an application, but I think the latter procedural semantics are more flexible and appropriate for a library like Genkit 👍

What types of abstractions do you have in mind? Eager to hear where you find Genkit's abstractions are lacking and how we might
improve. I think I would expect to see abstractions from other frameworks up the stack a bit (e.g. Agents, Chatbots or things like
FewShot prompt templates, etc) but I would not necessarily expect that Prompts and Flows need to be re-abstracted.

Thanks so much for such a thoughtful response, and I hope you all have a great weekend!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Philosophy behind usage of flows, prompts, and tools #731

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Philosophy behind usage of flows, prompts, and tools #731

ariel-pettyjohn Aug 1, 2024

Replies: 2 comments · 1 reply

ariel-pettyjohn Aug 1, 2024 Author

MichaelDoyle Aug 9, 2024 Collaborator

ariel-pettyjohn Aug 9, 2024 Author

ariel-pettyjohn
Aug 1, 2024

Replies: 2 comments 1 reply

ariel-pettyjohn
Aug 1, 2024
Author

MichaelDoyle
Aug 9, 2024
Collaborator

ariel-pettyjohn Aug 9, 2024
Author