This repository contains the source code to extract the dialogs used in the following paper:
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems arXiv:1506.08909.
- Postgresql
- Enchant
- PyPy (pyenchant, psycopg2)
- NodeJS (bluebird, knex, mkdirp)
psql -d template1
> create database ubuntu;
# ln -s /path/to/ubuntu/corpus data
# node createTable.js
# pypy main.py
This produces a file ubuntu.sql
# psql -d ubuntu
> copy messages from '/tmp/ubuntu.sql';
# node createTable.js index
# node extractDialogs.js nicks.txt