Skip to content

Commit

Permalink
Allow $ to literally denote quantities of USD in chat (#1068)
Browse files Browse the repository at this point in the history
* edit system prompt to request dollar signs not be used as inline math delimiters

* escape dollar signs in chat frontend

* escape `$` only if alone and not in a code element

* update docstring

* avoid double-escaping `$` symbols
  • Loading branch information
dlqqq authored Oct 31, 2024
1 parent 167a47d commit e1d99cc
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 4 deletions.
8 changes: 6 additions & 2 deletions packages/jupyter-ai-magics/jupyter_ai_magics/providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,12 @@
You are not a language model, but rather an application built on a foundation model from {provider_name} called {local_model_id}.
You are talkative and you provide lots of specific details from the foundation model's context.
You may use Markdown to format your response.
Code blocks must be formatted in Markdown.
Math should be rendered with inline TeX markup, surrounded by $.
If your response includes code, they must be enclosed in Markdown fenced code blocks (with triple backticks before and after).
If your response includes mathematical notation, they must be expressed in LaTeX markup and enclosed in LaTeX delimiters.
- Single dollar signs ($) should never be used as delimiters for inline math.
- Valid inline math: `\\( \\infty \\)`
- Valid display math: `\\[ \\infty \\]`
- Invalid inline math: `$\\infty$`
If you do not know the answer to a question, answer truthfully by responding that you do not know.
The following is a friendly conversation between you and a human.
""".strip()
Expand Down
69 changes: 67 additions & 2 deletions packages/jupyter-ai/src/components/rendermime-markdown.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,12 @@ type RendermimeMarkdownProps = {
};

/**
* Takes \( and returns \\(. Escapes LaTeX delimeters by adding extra backslashes where needed for proper rendering by @jupyterlab/rendermime.
* Escapes backslashes in LaTeX delimiters such that they appear in the DOM
* after the initial MarkDown render. For example, this function takes '\(` and
* returns `\\(`.
*
* Required for proper rendering of MarkDown + LaTeX markup in the chat by
* `ILatexTypesetter`.
*/
function escapeLatexDelimiters(text: string) {
return text
Expand All @@ -34,6 +39,61 @@ function escapeLatexDelimiters(text: string) {
.replace(/\\\]/g, '\\\\]');
}

/**
* Type predicate function that determines whether a given DOM Node is a Text
* node.
*/
function isTextNode(node: Node | null): node is Text {
return node?.nodeType === Node.TEXT_NODE;
}

/**
* Escapes all `$` symbols present in an HTML element except those within the
* following elements: `pre`, `code`, `samp`, `kbd`.
*
* This prevents `$` symbols from being used as inline math delimiters, allowing
* `$` symbols to be used literally to denote quantities of USD. This does not
* escape literal `$` within elements that display their contents literally,
* like code elements. This overrides JupyterLab's default rendering of MarkDown
* w/ LaTeX.
*
* The Jupyter AI system prompt should explicitly request that the LLM not use
* `$` as an inline math delimiter. This is the default behavior.
*/
function escapeDollarSymbols(el: HTMLElement) {
// Get all text nodes that are not within pre, code, samp, or kbd elements
const walker = document.createTreeWalker(el, NodeFilter.SHOW_TEXT, {
acceptNode: node => {
const isInSkippedElements = node.parentElement?.closest(
'pre, code, samp, kbd'
);
return isInSkippedElements
? NodeFilter.FILTER_SKIP
: NodeFilter.FILTER_ACCEPT;
}
});

// Collect all valid text nodes in an array.
const textNodes: Text[] = [];
let currentNode: Node | null;
while ((currentNode = walker.nextNode())) {
if (isTextNode(currentNode)) {
textNodes.push(currentNode);
}
}

// Replace each `$` symbol with `\$` for each text node, unless there is
// another `$` symbol adjacent or it is already escaped. Examples:
// - `$10 - $5` => `\$10 - \$5` (escaped)
// - `$$ \infty $$` => `$$ \infty $$` (unchanged)
// - `\$10` => `\$10` (unchanged, already escaped)
textNodes.forEach(node => {
if (node.textContent) {
node.textContent = node.textContent.replace(/(?<![$\\])\$(?!\$)/g, '\\$');
}
});
}

function RendermimeMarkdownBase(props: RendermimeMarkdownProps): JSX.Element {
// create a single renderer object at component mount
const [renderer] = useState(() => {
Expand All @@ -57,19 +117,24 @@ function RendermimeMarkdownBase(props: RendermimeMarkdownProps): JSX.Element {
*/
useEffect(() => {
const renderContent = async () => {
// initialize mime model
const mdStr = escapeLatexDelimiters(props.markdownStr);
const model = props.rmRegistry.createModel({
data: { [MD_MIME_TYPE]: mdStr }
});

// step 1: render markdown
await renderer.renderModel(model);
props.rmRegistry.latexTypesetter?.typeset(renderer.node);
if (!renderer.node) {
throw new Error(
'Rendermime was unable to render Markdown content within a chat message. Please report this upstream to Jupyter AI on GitHub.'
);
}

// step 2: render LaTeX via MathJax, while escaping single dollar symbols.
escapeDollarSymbols(renderer.node);
props.rmRegistry.latexTypesetter?.typeset(renderer.node);

// insert the rendering into renderingContainer if not yet inserted
if (renderingContainer.current !== null && !renderingInserted.current) {
renderingContainer.current.appendChild(renderer.node);
Expand Down

0 comments on commit e1d99cc

Please sign in to comment.