dataDM 💬📊
DataDM is your private data assistant. A conversational interface for your data where you can load, clean, transform, and visualize without a single line of code. DataDM is open source and can be run entirely locally, keeping your juicy data secrets fully private.
datadm_happiness_qa_and_plots.mp4
Note: Demo above is GPT-4
, which sends the conversation to OpenAI's API. To use in full local mode, be sure to select starchat-alpha-cuda
or starchat-beta-cuda
as the model. This will use the StarChat model, which is a bit less capable but runs entirely locally.
Join our discord to join the community and share your thoughts!
- Persistent Juptyer kernel backend for data manipulation during conversation
- Run entirely locally, keeping your data private
- Natural language chat, visualizations/plots, and direct download of data assets
- Easy to use docker-images for one-line deployment
- Load multiple tables directly into the chat
- Search for data and load CSVs directly from github
- Option to use OpenAI's GPT-3.5 or GPT-4 (requires API key)
- WIP: GGML based mode (CPU only, no GPU required)
- WIP: Rollback kernel state when undo
using(re-execute all cells)criu
- TODO: Support for more data sources (e.g. SQL, S3, PySpark etc.)
- TODO: Export a conversation as a notebook or html
- Load data from a URL
- Clean data by removing duplicates, nulls, outliers, etc.
- Join data from multiple tables into a single output table
- Visualize data with plots and charts
- Ask whatever you want to your very own private code-interpreter
You can use docker, colab, or install locally.
docker run -e OPENAI_API_KEY={{YOUR_API_KEY_HERE}} -p 7860:7860 -it ghcr.io/approximatelabs/datadm:latest
For local-mode using StarChat model (requiring a CUDA device with at least 24GB of RAM)
docker run --gpus all -p 7860:7860 -it ghcr.io/approximatelabs/datadm:latest-cuda
⚠️ datadm used this way runs LLM generated code in your userspace
For local-data, cloud-model mode (no GPU required) - requires an OpenAI API key
$ pip install datadm
$ datadm
For local-mode using StarChat model (requiring a CUDA device with at least 24GB of RAM)
$ pip install "datadm[cuda]"
$ datadm
- starchat-beta (starcoder with databricks-dolly and OpenAssistant/oasst1)
- Guidance
- HuggingFace
- OpenAI
Contributions are welcome! Feel free to submit a PR or open an issue.
Join the Discord to chat with the team
Check out our other projects: sketch and approximatelabs