Gretel Navigator is a compound AI-based system for generating high-quality synthetic data using contextual tags and an evolutionary approach. This method iteratively improves, validates, and evaluates outputs to create synthetic data with greater quality than an underlying LLM could do on its own, and to combat hallucinations in AI-generated content.
-
Clone the repository:
git clone https://github.com/gretelai/navigator-helpers.git cd navigator-helpers
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
The data synthesis configuration is defined in a YAML file. Here's a simplified example:
# Generator configuration
api_key: prompt
llm_model: gretelai/gpt-auto
log_level: INFO
use_reflection: true
output_filename: synthetic_data.jsonl
evolution_generations: 1
num_examples: 1000
# Data model definition
generation_instructions: |
You are a seasoned SQL expert specializing in crafting intricate, context-rich queries and explanations.
Use the provided contextual tags as instructions for generation.
fields:
- name: sql_context
type: str
description: A single string comprising multiple valid PostgreSQL CREATE TABLE statements.
validator: sql:postgres
- name: prompt
type: str
description: A detailed, nuanced natural language prompt that a user might ask to a database for a particular task.
contextual_tags:
tags:
- name: sql_complexity
values:
- value: Moderate
weight: 0.4
- value: Complex
weight: 0.3
- value: Very Complex
weight: 0.2
- value: Expert
weight: 0.1
evolution:
rate: 0.1
strategies:
- Enhance the schema to include domain-specific tables and data types.
- Add relevant indexes, constraints, and views reflecting real-world designs.
To generate synthetic data, use the run_generation.py
script:
python examples/run_generation.py examples/example_nl2sql.yml
This script will:
- Load the YAML configuration
- Create a
DataModel
from the YAML - Initialize the
SyntheticDataGenerator
- Generate the data
The output will be saved to the file specified by output_filename
in the YAML configuration.
-
YAML Configuration: All configuration is centralized in a YAML file, making it easier to manage and modify.
-
Evolutionary Process: Controlled by the
evolution
section in the YAML config, including strategies and rate. -
Contextual Tags: Defined in the YAML config under
contextual_tags
, supporting weighted values. -
Field-specific Configuration: Each field can have its own type, description, and validator.
-
Flexible Validation: Validators can be specified for each field in the YAML config.
-
Reflection: Controlled by the
use_reflection
parameter in the YAML config. -
Customizable Output: The
output_filename
in the YAML config determines the output file name.
For more detailed information on these features, please refer to the source code and comments within the project.
Example YAML configurations and usage scenarios can be found in the examples/
directory.
Contributions are welcome! Please feel free to submit a pull request or open an issue.
This project is licensed under the Gretel License. See the LICENSE
file for details.