-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactivity Overhaul (User Interface & Model Instrumentation & Network Comms) #1054
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Visualization components do better with state handled as traces that can rewind. As such definitions and evaluation of a guidance grammar is separated here while minimizing changes needed at the grammar level.
Probably need to have separate fields for tracking, input and output of a given node.
Trace can now handle capture groups. State module moved to trace module.
Documentation added and some type changes.
Trace nodes have light adjustments. HTML renderer is connected but fully working yet due to role closers.
Old HTML display now fully replaced. Fixed some roles issues as well.
Uses stitch for kernel to client communication. Need to redesign and hook in instrumentation.
Tooling appears to create a nameless role. Fixed.
Kernel messages still need to be re-implemented.
This package is required for Jupyter kernel comms via a custom ipywidget.
Copyright headers now correctly pointing to Guidance Contributors.
Trace messages are now JSON serializable. Some minor fixes like adding a manifest for package.
Client has a race condition where it skips messages that have been fired by stitch before it loads.
Had to send a heartbeat first then send all messages in buffer.
Client messages can be handled in engine. Output for print and log not working due to being in an ipywidget. Will need to re-implement with asyncio later.
Separate thread for send/recv on messages.
Final message sent on cell completion. Still needs further testing.
No more dictionaries to recv_msg!
This includes for HTML renderer.
Queue instantiation now deferred to asyncio background thread.
Remove visual from extras_requires
`mesg_recv` will now receive execution start message -- might need to refactor part of renderer later.
Signed-off-by: JC1DA <[email protected]>
Fix missing echo in remote engine call & Refactor code
Signed-off-by: JC1DA <[email protected]>
Resume/Pause periodic_metrics_generator whenever entering/exitting cell
Signed-off-by: JC1DA <[email protected]>
Show avg_latency, token_consumed and token_reduction on the UI
Merge main repo
Adjusted renderer to property too for model.
Fix not-matching text and tokens in phi-3 special cases
Phantom root node not being picked up. I think this needs a refactor later not to rely on phantoms.
Updated python test for stitch and version.
It's big.
Harsha-Nori
approved these changes
Dec 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Interactivity Overhaul
Overview
This PR is the first of many focusing on interactivity. It introduces an updated user interface for notebooks, new instrumentation for models, and a respective network layer to handle bidirectional communication between the IPython kernel and JavaScript client. To further support this, models have reworked rendering, added tracing logic to better support replays where required.
This PR also functions as a foundational step towards near future work including rendering across various environments (i.e. terminal support as TUI and append-only outputs), upgraded benchmarking and model inspection.
TL;DR
We added a lot of code to support better model metrics and visualization. We are getting ready for multimedia streaming, and want to have users deep inspect all the models, without overheating the computer.
Acknowledgements
Big shoutouts to:
Running this PR
cd packages/python/stitch && pip install -e .
User Interface
Design principle: All visibility. No magic.
Overall we're trying to show as much as we can on model outputs. When debugging outputs, there can be real ugliness that is often hidden away including tokenization concerns and critical points that may dictate the rest of the output. This need for inspection increases as users begin to define their own structured decoding grammars, unexpected overconstraints can occur in development.
The old user interface that displays HTML as a side-effect in notebooks when models compute, have been replaced with a custom Jupyter Widget (see Network Communications for more detail), of which hosts an interactive sandboxed iframe. We still support a legacy mode, if users desire the previous UI.
Before
After
We're getting more information to the output at the expense of less text density. There is simply more going on, and in order to keep some legibility we've increased text size and spacing, compensating for two visual elements (highlighting and underlines) that are used to convey token info for scanning. A general metrics bar is also displayed for discoverability on token reduction and other efficiency metrics relevant when prompt engineering for reduced costs.
When users want further detail on tokens, we support a tool tip that contains top 5 alternate token candidates alongside exact values for visual elements. Highlighting has been applied to candidates, accentuating tokens that include spaces.
We use a mono-space typeface such that data format outputs can be inspected quicker (i.e. verticality can matter for balancing braces and indentation).
As users learn a system: a UI with easier discoverability can come at the cost of productivity. We've made all visual components optional to keep our power users in the flow, and in the future we intend to allow users to define defaults to fully support this.
For legacy mode (modeled after previous UI). Users can execute
guidance.legacy_mode(True)
at the start of their notebook.Old school cool.
The Code
Added
guidance.visual
module. Handles renderer creation (stitch or HTML display) and all required messaging. This also handles Jupyter cell change detection for deciding when widgets need to be instantiated or reset.guidance.trace
module. Tracks model inputs & outputs of an engine. Important for replaying for clients.graphpaper-inline
NPM package has been added. This handles all client-side rendering and messaging. Written with Svelte/TypeScript/Tailwind/D3.Changed
Model
class and has been delegated toRenderer
member where possible.Model
class now generates role openers and closer text directly from its respective chat template.Instrumentation
Instrumentation is key for model inspection, debugging and cost-sensitive prompt engineering. This includes backing the new UI. Metrics are now collected for both general compute resources (CPU/GPU/RAM) and model tokens (including token counts/reduction, latency, type, backtracking).
The Code
Added (metric collection feature)
Changed:
VisBytesChunk also stores the list of EngineOutput objects generated by the engine during chunk generation.
This facilitates the process of checking tokens from the final state are generated, force-forwarded or from user input.
This function is used at the end of the cell execution to calculate the probabilities of model state in unconstrained mode.
Stats include issued token, probability, latency, top-k, masked-top-k if available.
Data from get_per_token_stats will be reported to the UI for new visualization.
Network Communications
We have two emerging requirements that will impact future guidance development. One, the emergence of streaming multimedia around language models (audio/video). Two, user interactivity within the UI, requesting more data or computation that may not be feasible to r
pre-(?:fetch|calculate)
to a static client.For user interactivity from UI to Python, it's also important that we cover as many notebook environments as possible. Each cloud notebook provider has their own quirks of which complicates client development. Some providers love resizing cell outputs indefinitely, others refuse to display HTML unless it's secured away in an isolated iframe.
All in all, we need a solution that is isolated, somewhat available across providers and can allow streams of messages between server (Jupyter Python kernel) and client (cell output with a touch of JS).
Stitch
stitch
is an auxiliary package we've created, that handles bi-directional communication between a web client and a Jupyter python kernel. It does this by creating a thin custom Jupyter widget that handles messages between the kernel and a sandboxed iframe hosting the web client. It looks something like this:python code
->kernel-side jupyter widget
->kernel comms (ZMQ)
->client-side jupyter widget
->window event message
->sandboxed iframe
->web client (graphpaper-inline)
This package drives messages between
guidance.visual
module andgraphpaper-inline
client. All messages are streamed to allow near-real-time rendering within a notebook. Bi-directional comms is used to repair the display if initial messages have been missed (client will request a full replay when it notices the first message it receives has a non-zero identifier).The Code
stitch
Python package. Can be found atpackages/python/stitch
.Future work
We wanted to shoot for the stars, and ended up in the ocean. The following will occur after this PR.
Near future tasks: