Skip to content

layterz/promptx

Repository files navigation

promptx

A framework for building AI systems.

promptx is a framework for working with generative models and embeddings. The example below goes through the basics of how to control the output of models and store the results as embeddings.

Blog posts:

Getting started

pip install pxx
from promptx import prompt

character = 'Batman'
prompt(f'Write a character profile for {character}')
'Name: Batman\nOccupation: Superhero/Vigilante\nAlias: The Dark Knight, The Caped Crusader\nReal Name: Bruce Wayne\nAge: Late 30s to early 40s\nHeight: 6\'2"\nWeight: 210 lbs\nEthnicity: Caucasian\nNationality: American\nPlace of Birth: Gotham City\nParents: Thomas and Martha Wayne (deceased)\nAffiliations: Justice League, Bat-Family, Gotham City Police Department (unofficially)\nSkills/Abilities: Exceptionally skilled detective, master martial artist and hand-to-hand combatant, peak human physical condition, proficient in various forms of weaponry and gadgets, expert in stealth and espionage, genius-level intellect and deductive reasoning\nCostume: Black bat-themed bodysuit, utility belt with various gadgets, protective body armor, cape with bat-like wingspan\nVehicle: Batmobile...

By default, this returns a plain string response, but to generate complex data you can pass in the expected schema along with the prompt input.

Note: Entity is a thin layer on top of pydantic.BaseModel that allows the object to be stored as an embedding. You can use pydantic.BaseModel directly if you don't need to store the object as an embedding and just want to use it as the prompt output schema.

from pydantic import Field
from promptx.collection import Entity

class Character(Entity):
    name: str = Field(..., embed=False),
    description: str = Field(..., description='Describe the character in a few sentences')
    age: int = Field(..., ge=0, le=120)

batman = prompt('Generate a character profile for Batman', output=Character)
batman
Character(
    id='dd740305-737a-4bde-8596-3525490d2f6f',
    type='character',
    name='Bruce Wayne',
    description='Batman is a superhero who operates in Gotham City. He is known for his detective skills, martial arts training, and use of high-tech gadgets to fight crime. Bruce Wayne, his alter ego, is a billionaire philanthropist and the owner of Wayne Enterprises.',
    age=35
)

This returns an instance of the specified schema using the generated response as the input data. Let's create a list of instead.

characters = prompt(
    'Generate some characters from the Batman universe',
    output=[Character],
)

characters
id	type	name	description	age
0	7ec67bd1-0626-4293-ab0a-509cee7f5b66	character	Batman	Bruce Wayne is a billionaire philanthropist by...	35
1	7d7e53e6-f1dd-4f0a-bda7-6533ba385960	character	Joker	The Joker is a deranged and unpredictable crim...	40
2	45157783-7006-476e-a492-2f6881d2410d	character	Catwoman	Selina Kyle is a skilled thief with a complica...	30

If the output is a list, prompt returns a Collection, which extends pd.DataFrame. To extract the Entity representations, use the objects property.

We can now store these generated objects as embeddings in a collection.

from promptx import store

store(*characters.objects)

This stores the object as an embedding, along with some metadata, in a vector database (ChromaDB by default). The process is quite simple, it embeds the whole object as a JSON string and each field individually. This allows us to query the database using any field in the object.

from promptx import query

query()
id	type	name	description	age
0	7ec67bd1-0626-4293-ab0a-509cee7f5b66	character	Batman	Bruce Wayne is a billionaire philanthropist by...	35
1	7d7e53e6-f1dd-4f0a-bda7-6533ba385960	character	Joker	The Joker is a deranged and unpredictable crim...	40
2	45157783-7006-476e-a492-2f6881d2410d	character	Catwoman	Selina Kyle is a skilled thief with a complica...	30

Now let's generate some more characters and add them to the collection. We'll first get any existing characters and extract their names, which we can pass to the prompt to avoid generating duplicates. Any characters generated will be added the list during iteration. Finally, we'll store all the generated characters in the collection.

n = 3
characters = query().objects

for _ in range(n):
    characters += prompt(
        '''
        Generate a list of new characters from the Batman universe.
        Don't use any of the existing characters.
        ''',
        input = {
            'existing_characters': [c.name for c in characters],
        },
        output=[Character],
    ).objects

store(*characters)
query()
id	type	name	description	age
0	7ec67bd1-0626-4293-ab0a-509cee7f5b66	character	Batman	Bruce Wayne is a billionaire philanthropist by...	35
1	7d7e53e6-f1dd-4f0a-bda7-6533ba385960	character	Joker	The Joker is a deranged and unpredictable crim...	40
2	45157783-7006-476e-a492-2f6881d2410d	character	Catwoman	Selina Kyle is a skilled thief with a complica...	30
3	2ad9eb88-406a-4d55-8e19-a5cb0dc7035f	character	Nightshade	Nightshade is a highly skilled acrobat and thi...	28
4	63472046-2be4-40a0-a4b0-ba918e5a7ad4	character	Shadowstrike	Shadowstrike is a master of stealth and weapon...	33
5	97fd9c56-9531-4444-b7d6-246070b98405	character	Twilight	Twilight is a vigilante who fights for justice...	25
6	b40dfcbf-107e-4867-adad-1de2c1260fef	character	Gotham Girl	Gotham Girl is a young superhuman who gained h...	18
7	97c965d9-2669-49c1-8817-1ab53f1f97ff	character	Midnight	Midnight is a master of stealth and prefers to...	28
8	16c569fe-18ee-4b18-8c86-428c05cfbd2d	character	Raven	Raven is a sorceress who possesses the ability...	25
9	dc555573-05f5-4aae-babe-16ece0c401f8	character	Harbinger	Harbinger is a mysterious and enigmatic vigila...	35
10	c8945846-4cf0-4e6f-ab25-82ec8fbdb8c8	character	Silverwolf	Silverwolf is a skilled martial artist with a ...	28
11	56e0004f-a656-41fb-902a-9b46522c4e23	character	Luna	Luna is a master of illusions and deception. S...	27

This compares the query text with the stored objects, returning results that are closest in vector space.

Note: the effectiveness of embedding queries will depend on what data has been embedded. In this case, ChatGPT will know some details about the generated characters and so does a decent job on this data. For other data, you may find generating synthetic intermediary data to be helpful. E.g. generating thoughts and/or quotes about a set of documents.

Because Collection extends pd.DataFrame, we can use all the usual Pandas methods to filter and sort the results.

villains[villains.age < 30]
	id	type	name	description	age
0	c8945846-4cf0-4e6f-ab25-82ec8fbdb8c8	character	Silverwolf	Silverwolf is a skilled martial artist with a ...	28
3	97fd9c56-9531-4444-b7d6-246070b98405	character	Twilight	Twilight is a vigilante who fights for justice...	25
4	97c965d9-2669-49c1-8817-1ab53f1f97ff	character	Midnight	Midnight is a master of stealth and prefers to...	28
5	56e0004f-a656-41fb-902a-9b46522c4e23	character	Luna	Luna is a master of illusions and deception. S...	27

Relationships can be defined by setting the field to a type which subclasses Entity (or a list of that type). Internally, this is stored as a query and then loaded when the field is accessed from the database.

class StoryIdea(Entity):
    title: str
    description: str = None
    characters: list[Character] = None

characters = query('they are a villain').sample(3).objects

ideas = prompt(
    'Generate some story ideas',
    input={
        'characters': characters,
    },
    output=[StoryIdea],
).objects

for idea in ideas:
    idea.characters = characters

store(*ideas, collection='story-ideas')
query(collection='story-ideas')
id	type	title	description	characters
0	886bc464-7b4a-4e4e-b228-e3791c6d7f77	storyidea	The Revenge of Silverwolf	After years of training and honing his skills,...	[{'id': 'c8945846-4cf0-4e6f-ab25-82ec8fbdb8c8'...
1	63668f1a-8a1f-4d68-9926-7b6cb83147fb	storyidea	The Shadow's Game	Midnight finds herself caught in a deadly game...	[{'id': 'c8945846-4cf0-4e6f-ab25-82ec8fbdb8c8'...
2	cedc978b-463f-43e1-9898-690aada5b832	storyidea	The Illusionist's Gambit	Luna finds herself lured into a dangerous game...	[{'id': 'c8945846-4cf0-4e6f-ab25-82ec8fbdb8c8'...

Note that the output is being stored in a collection called story-ideas, which is created if it doesn't exist. Previously, all the data we've stored has been in the 'default' collection.

Collections are widely used internally to represent stored models, templates, prompt history, etc. This provides a consistent interface for accessing and manipulating data.

So far we've used the default model (GPT-3.5) when generating data, but you can specify a custom model using the llm= parameter.

from promptx.models.openai import ChatGPT

gpt4 = ChatGPT(id='gpt4', model='gpt4')

characters = prompt(
    'Generate some characters from the Batman universe',
    output=[Character],
    llm=gpt4,
)

You can define any commonly used models, templates, etc, along with defining other settings, by creating a config.py file in the root of the project (i.e. adjacent to the .px/ directory). This file is loaded when the project is initialized and a setup function is expected. Here's a simple example that defines a few custom models and a template.

# ./config.py

from promptx.models.openai import ChatGPT

gpt4 = ChatGPT(id='gpt4', model='gpt4')

def setup(session):
    session.store(gpt4, collection='models')

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published