Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding example that shows how to pass images in multi-modal prompts #258

Merged
merged 6 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions typescript/examples/crossword/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Crossword

The Crossword example shows how to include an image in a multimodal prompt and use the image to answer a user's question. The responses follow the [`CrosswordActions`](./src/crosswordSchema.ts) type.

## Target models

This example explores multi-modal input. Torun this, you will need a model that accepts images as input. The example has beeentested with **gpt-4-vision** and **gpt-4-omni** models.

# Try Crossword
To run the Crossword example, follow the instructions in the [examples README](../README.md#step-1-configure-your-development-environment).

# Usage
Example prompts can be found in [`src/input.txt`](./src/input.txt).

For example, given the following input statement:

**Input**:
```
🏁> What is the clue for 61 across
```

**Output**:
```
"Monogram in French fashion"
```
24 changes: 24 additions & 0 deletions typescript/examples/crossword/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "crossword",
"version": "0.0.1",
"private": true,
"description": "",
"main": "dist/main.js",
"scripts": {
"build": "tsc -p src",
"postbuild": "copyfiles -u 1 src/**/*Schema.ts src/**/*.txt src/**/*.jpeg dist"
},
"author": "",
"license": "MIT",
"dependencies": {
"dotenv": "^16.3.1",
"find-config": "^1.0.0",
"typechat": "^0.1.0",
"typescript": "^5.3.3"
},
"devDependencies": {
"@types/find-config": "1.0.4",
"@types/node": "^20.10.4",
"copyfiles": "^2.4.1"
}
}
33 changes: 33 additions & 0 deletions typescript/examples/crossword/src/crosswordSchema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// The following is a schema definition for determining the sentiment of a some user input.

export type GetClueText = {
actionName: "getClueText";
parameters: {
clueNumber: number;
clueDirection: "across" | "down";
value: string;
};
};

// This gives the answer for the requested crossword clue
export type GetAnswerValue = {
actionName: "getAnswerValue";
parameters: {
proposedAnswer: string;
clueNumber: number;
clueDirection: "across" | "down";
};
};

export type UnknownAction = {
actionName: "unknown";
parameters: {
// text typed by the user that the system did not understand
text: string;
};
};

export type CrosswordActions =
| GetClueText
| GetAnswerValue
| UnknownAction;
2 changes: 2 additions & 0 deletions typescript/examples/crossword/src/input.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
What is the clue for 1 down
Give me a hint for solving 4 down
34 changes: 34 additions & 0 deletions typescript/examples/crossword/src/main.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import assert from "assert";
import dotenv from "dotenv";
import findConfig from "find-config";
import fs from "fs";
import path from "path";
import { createLanguageModel } from "typechat";
import { processRequests } from "typechat/interactive";
import { createTypeScriptJsonValidator } from "typechat/ts";
import { CrosswordActions } from "./crosswordSchema";
import { createCrosswordActionTranslator } from "./translator";

const dotEnvPath = findConfig(".env");
assert(dotEnvPath, ".env file not found!");
dotenv.config({ path: dotEnvPath });

const model = createLanguageModel(process.env);
const schema = fs.readFileSync(path.join(__dirname, "crosswordSchema.ts"), "utf8");

const rawImage = fs.readFileSync(path.join(__dirname, "puzzleScreenshot.jpeg"),"base64");
const screenshot = `data:image/jpeg;base64,${rawImage}`;

const validator = createTypeScriptJsonValidator<CrosswordActions>(schema, "CrosswordActions");
const translator = createCrosswordActionTranslator(model, validator, screenshot);

// Process requests interactively or from the input file specified on the command line
processRequests("🏁> ", process.argv[2], async (request) => {
const response = await translator.translate(request);
if (!response.success) {
console.log(response.message);
return;
}

console.log(JSON.stringify(response.data));
});
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 82 additions & 0 deletions typescript/examples/crossword/src/translator.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import {
TypeChatLanguageModel,
createJsonTranslator,
TypeChatJsonTranslator,
MultimodalPromptContent,
PromptContent,
} from "typechat";
import { TypeScriptJsonValidator } from "typechat/ts";

export function createCrosswordActionTranslator<T extends object>(
model: TypeChatLanguageModel,
validator: TypeScriptJsonValidator<T>,
crosswordImage: string
): TypeChatJsonTranslator<T> {
const _imageContent = crosswordImage;

const _translator = createJsonTranslator(model, validator);
_translator.createRequestPrompt = createRequestPrompt

return _translator;

function createRequestPrompt(request: string): PromptContent {
const screenshotSection = getScreenshotPromptSection(_imageContent);
const contentSections = [
{
type: "text",
text: "You are a virtual assistant that can help users to complete requests by interacting with the UI of a webpage.",
},
...screenshotSection,
{
type: "text",
text: `
Use the layout information provided to answer user queries.
The responses should be translated into JSON objects of type ${_translator.validator.getTypeName()} using the typescript schema below:

'''
${_translator.validator.getSchemaText()}
'''
`,
},
{
type: "text",
text: `
The following is a user request:
'''
${request}
'''
The following is the assistant's response translated into a JSON object with 2 spaces of indentation and no properties with the value undefined:
`,
},
] as MultimodalPromptContent[];

return contentSections;
}

function getScreenshotPromptSection(screenshot: string | undefined) {
let screenshotSection = [];
if (screenshot) {
screenshotSection.push({
type: "text",
text: "Here is a screenshot of the currently visible webpage",
});

screenshotSection.push({
type: "image_url",
image_url: {
url: screenshot,
detail: "high"
},
});

screenshotSection.push({
type: "text",
text: `Use the top left corner as coordinate 0,0 and draw a virtual grid of 1x1 pixels,
where x values increase for each pixel as you go from left to right, and y values increase
as you go from top to bottom.
`,
});
}
return screenshotSection;
}
}
16 changes: 16 additions & 0 deletions typescript/examples/crossword/src/tsconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"compilerOptions": {
"target": "es2021",
"lib": ["es2021"],
"module": "node16",
"types": ["node"],
"outDir": "../dist",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"exactOptionalPropertyTypes": true,
"inlineSourceMap": true
}
}
19 changes: 19 additions & 0 deletions typescript/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion typescript/src/model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ export interface PromptSection {
/**
* Specifies the content of this section.
*/
content: string | MultimodalPromptContent[];
content: PromptContent;
}

export type PromptContent =
| string
| MultimodalPromptContent[];

/**
* GPT-4-vision, GPT-4-omni and GPT-4-turbo allow multi-modal input, where images and text can
* be part of the prompt. To support this, the content section of the prompt has an array of objects.
Expand Down
4 changes: 2 additions & 2 deletions typescript/src/typechat.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Result, success, error } from "./result";
import { TypeChatLanguageModel, PromptSection } from "./model";
import { TypeChatLanguageModel, PromptSection, PromptContent } from "./model";

/**
* Represents an object that can translate natural language requests in JSON objects of the given type.
Expand Down Expand Up @@ -31,7 +31,7 @@ export interface TypeChatJsonTranslator<T extends object> {
* @param request The natural language request.
* @returns A prompt that combines the request with the schema and type name of the underlying validator.
*/
createRequestPrompt(request: string): string;
createRequestPrompt(request: string): PromptContent;
/**
* Creates a repair prompt to append to an original prompt/response in order to repair a JSON object that
* failed to validate. This function is called by `completeAndValidate` when `attemptRepair` is true and the
Expand Down
Loading