Ask Airy

An Airtable AI assistant enabling semantic search and GPT-powered Q&A

View it on the Airtable Marketplace

See it in Action

How it Works

Ask Airy is an Airtable extension that enables you to intuitively search and query data in Airtable with natural language using GPT-3.5 and embeddings-based semantic search. Let’s say you have a text-heavy database containing, for instance, product/customer reviews. With Ask Airy, you can ask a question like “Find and summarize negative reviews. Create a list of action items for improving the product design to address the feedback. Cite relevant review IDs in your response.” and get a data-backed response back in natural language. Airtable is a “no-code” relational database platform with a UI similar to google sheets. While this extension was built for Airtable, the architecture/logic is applicable to any application with relational/tabular data. At its core, the application works by:

Generating embeddings using OpenAI’s embeddings endpoint for each row in a table and storing them as another column within the table.
When a user enters a query/question, it generates a hypothetical document embedding ( HyDE) to use for a semantic search. (link to code)
Executes a semantic search via cosine similarity ([link to code](link to code))
Stuffs the most relevant records, along with the user’s original query into a GPT-3.5 prompt
Streams the LLM response back to the UI, and shows the user the (potentially) most relevant records for their query.

Takeaways/Lessons:

All the prompt engineering for this project is in the OpenAIService.ts An interesting problem was staying under the RPM/TPM rate limits for the OpenAI embeddings endpoint when bulk generating embeddings for the database tables. My rate limiter approach can be found here.
I was able to stream the chat completion response to a React component using npm: openai-streams along with a custom React hook I wrote.
Given that this project employs context stuffing with gpt-3.5, staying under context window token limits was important. I found that estimating tokens with the “num characters divided by 4” rule of thumb was not reliable, especially when things like URLs were in the input string. While OpenAI provides a tokenizer for counting tokens in python, there is no official JS tokenizer. I ended up using npm: gpt3-tokenizer even though it only provides tokenization using the r50k_base encoding. GPT 3.5+ uses the cl100k_base encoding, but I found that their tokenization outputs were similar enough.

License

This project is licensed under the MIT license.

See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
scripts		scripts
src		src
test		test
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
block.json		block.json
jest.config.ts		jest.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ask Airy

View it on the Airtable Marketplace

How it Works

License

About

Releases

Packages

Languages

License

Zakinator123/ask-airy

Folders and files

Latest commit

History

Repository files navigation

Ask Airy

View it on the Airtable Marketplace

How it Works

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages