Skip to content

AI-powered automatic dataset creation from the web, Support for LoRA and SFT question generation!

License

Notifications You must be signed in to change notification settings

data-dream-gdsp/Hello-Happy-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hello-Happy-World

Hello-Happy-World is an AI-powered automatic dataset creation tool that supports generating LoRA datasets and SFT question sets from the web.

Features

  • Automatic LoRA Dataset Generation: Generate LoRA datasets using pre-defined prompts.
  • Automatic SFT Question Set Generation: Create SFT question sets from prompt files.
  • DuckDuckGo Search Integration: Search specific topics via DuckDuckGo, and inject search results into the generated content.

Installation

Ensure you have installed the necessary libraries, such as requests and rich:

pip install requests rich

Usage

python main.py --task [LoRA/SFT] --output [output_file_name] --topic [search_topic]
  • --task: Choose the type of generation, either LoRA or SFT.
  • --output: Specify the output file name.
  • --topic: Specify the topic for DuckDuckGo search. The search results will replace the {data} placeholder in the prompt file.

Examples

Generate an SFT question set, search for the topic "AI trends" on DuckDuckGo, and save it to sft_output.json:

python main.py --task SFT --output sft_output.json --topic "AI trends"

File Structure

  • static/LoRA/prompt.md: Prompt file for generating LoRA datasets.
  • static/SFT/prompt.md: Prompt file for generating SFT question sets.
  • AI/config.yaml: Configuration file for the LLM model.

License

This project is licensed under the Apache License. For more details, see the LICENSE file.

About

AI-powered automatic dataset creation from the web, Support for LoRA and SFT question generation!

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Languages