TiddlyWiki-LLM-dataset

WikiText syntax dataset for auto UI generation in TiddlyWiki.

Overview

Read folders in the config. This will create snapshot of TW core and plugins. And skip the duplication if text is same as first item's input (See Data for detail about snapshot).
Generate more QA pair with templates
Generate missing Q or A using LLM
Generate review API call and upload to review platform
Ask community for help reviewing
Import updated wikitext when TiddlyWiki version bump and rerun the above pipeline
Export dataset for LLM fine-tune

AI prompt

We generate material so human don't need to create them from scratch.

Data

Prompts are WikiText tiddlers with variable transclusion, in the wiki's 'prompts folder.

Each tiddler's wikified body will be put to review platform. And they can use following variables from pipeline.

InputTiddler .tid file content with metadata part, read from TW core or other plugins.
inputWikiText Text part of tiddler, extracted by pipeline
aIOutput GPT generated output, generated by pipeline

Pipeline

Use pure JS to get InputTiddler and inputWikiText
These variables will be available in tiddlers in the pipeline's 'prompts folder.
We process "input" X "Data prompt tiddler" matrix member one by one, we can get prompt field of data tiddler, as DataPrompt variable
Compose the variables using WikiText, and we use wikified tiddler text as AI input, to get the content for aIOutput variable.

graph TD
    A[TW core or other plugins] --> B[InputTiddler and inputWikiText]
    B --> D[Pipeline Template]
    C[Data Prompt Templates' Metadata] --> D
    D --> E[aIOutput]
    E --> F[Data Prompt Templates' Body]
    B --> F
    F --> G[Text to Review]

Review

In the review platform, there are "original language" and "translation" areas, because we are using a translating review platform.

Original WikiText + AI prompt that generate the material will be the "original language", and AI generated chat material can will put in the "translation" area, open for human to review.

You can join the project in Paratranz, and review each material, edit them to be correct (AI will have mistake, they don't understand WikiText well, because there was never good learning material before like we are creating).

How to run

Have Deno installed
Clone this project and TiddlyWiki5 repo side by side in a folder.
"cd" into this project's folder
1. Get a DeepSeek API key or OpenAI API key, put it in a .env file, copied from .env.template file.
2. Run deno task dev

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data		data
wiki		wiki
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deno.json		deno.json
deno.lock		deno.lock
generateChatML.ts		generateChatML.ts
main.ts		main.ts
main_test.ts		main_test.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TiddlyWiki-LLM-dataset

Overview

AI prompt

Data

Pipeline

Review

How to run

About

Releases

Languages

License

tiddly-gittly/TiddlyWiki-LLM-dataset

Folders and files

Latest commit

History

Repository files navigation

TiddlyWiki-LLM-dataset

Overview

AI prompt

Data

Pipeline

Review

How to run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages