Philosophy behind usage of flows, prompts, and tools #731
Replies: 2 comments 1 reply
-
To give more context, here's some code I'm using to define a flow to list properties on abstract "objects", e.g. an
The Of course, those are only two options, and this is still a relatively toy example, but hopefully it's fairly illustrative of the more general questions. |
Beta Was this translation helpful? Give feedback.
-
Hey Ariel, Looks like we dropped the ball on this. Sorry about that. Could you say more about the following? How are you using flows to test prompts?
As to your main question:
I don't think we'll have a definitive, slam dunk, answer to this question. :) At a meta level, you are describing the continuum we are all on as we move from classic imperatively programmed apps, to hybrid apps that mix imperative instructions with the LLM's natural language "reasoning", all the way to the other side in giving the LLM full control. One of the jobs Genkit has is to help developers make this transition over time. There are many trade-offs, and these trade-offs will evolve as models improve. For example, LLMs will perform best when you can clearly describe the exact problem to be solved as specifically as you can. The more "tools" you provide to the LLM, the more complex its "reasoning" will become and hallucinations will increase. Another consideration is at which points you need to go back to the user for confirmation before an action is taken, etc.
Keep in mind that tool calling is going to incur multiple API calls. The flow (no pun intended) generally involves a back and forth between the code base and the model. In a typical exchange the model will request the tool call, the application code is responsible for running that request and then calling model again with the results. From there the model will generate the next response, which could be final, or yet another tool request. As the exchange continues, the context gets larger, and the cost of each request will increase. In general:
Also keep in mind tool calling is only one workflow choice. RAG is another workflow where you would "prefetch" context from a data store that is relevant to a user query and provide that to the LLM along with the prompt. What types of abstractions do you have in mind? Eager to hear where you find Genkit's abstractions are lacking and how we might improve. I think I would expect to see abstractions from other frameworks up the stack a bit (e.g. Agents, Chatbots or things like FewShot prompt templates, etc) but I would not necessarily expect that Prompts and Flows need to be re-abstracted. |
Beta Was this translation helpful? Give feedback.
-
For background context, I'm working on an object-oriented wrapper around Genkit to complement an ODM I built for Firebase/Firestore. My motivation is the result of having seen both senior engineers, often coming from a .NET background, and junior engineers, fresh out of bootcamp, struggle to adapt to the Node/Express/React/Firebase (NERF?) ecosystem because of its lack of familiar structure. So, my design philosophy coming into this is to create libraries that give just enough structure to solve some of the pain points that I see both junior and senior engineers suffer. I don't want to over-engineer abstractions around flows or tools in this case for example, but I do find that developers are often more productive with a dash or pinch of object-orientation in just the right places.
I plan to make the ODM public soon, but, as a design reference, it contains Firebase, FirebaseCollection, and FirebaseDocument classes. It inverts the Firestore
hasA
CollectionhasA
Document dependency by injecting a parentcollection
instance intodocument
instances, and injects afirestore
instance intocollection
instances. This allows users to extend these classes to create custom Collections and Documents in order to model their domain using conventional object-oriented techniques. It also allows for very powerful automatic schema-driven denormalization within the persistence layer itself, but this feature is still under development. The big idea is to create libraries that let both API and front-end developers focus 100% on delivering new features for real-time serverless apps while still using conventional, tried-and-true, OOP practices (instead of spinning their wheels building ad-hoc implementations of things like denormalization for every Firebase call).So, with that background in mind, I'm curious what the architectural and design philosophy is behind the usage of flows, prompts, and tools?
I'm also curious if there's a specific philosophy behind using a "tool-based architecture" vs. a conventional "service-based" one?
More concretely, since flows are functions, it seems natural to use them to compose the results of multiple prompts; however, in practice, flows seem more like a facility for doing things like testing prompts. That would suggest, since flows consume prompts, that it might be appropriate to invert that dependency and inject flows into prompts with a sole responsibility to that one prompt (and vice versa). That would then leave using tools or custom services to compose results, but I don't want to couple flows/prompts without good reason.
If flows are designed as a facility on a single prompt, then is there a philosophy behind when to use tools to compose results vs. services?
By service I just mean an external class or function responsible for calling multiple flows and resolving promises, each with their own input/output schemata, which then composes their results. This compared to defining multiple tools, each with their own input/output schemata, and then prompting the model to produce one composite response using those tools. I expect making multiple API calls and then composing them is less cost-effective, but I'm currently working on testing the quality of results from each approach out of curiosity.
I want to give the developer as much flexibility as possible, but, from the perspective of feature-development, I've started taking a "prompt-centric" approach. As an API consumer, we just want a black-box oracle that can answer some query, or provide some dynamic result, without having to worry about details like what a flow is. If it's appropriate to invert the dependency between flows and prompts, then I could see myself going to the place of completely subsuming flow management into a prompt class in order to abstract those details from the developer. That would still leave tools (and possibly custom services) for composing prompt results, while streamlining the developer experience.
On the other hand, I could also see myself going to the place of internalizing all "service" logic within flows. This would mean, for example, resolving all promises from child flows and composing the results within a parent "service flow". I want to offer the developer as much flexibility as possible, but it seems there's room for affordance in either one direction or the other. It seems like flows could either become redundant at a high-enough level or themselves take sole responsibility for service logic. It could also just be excessive at this level, but I do find my code tends to produce either "thin" flows or "thin" prompts, which suggests to me that something might be effectively abstracted.
Thanks in advance for any feedback!
Beta Was this translation helpful? Give feedback.
All reactions