Skip to content

Latest commit

 

History

History
275 lines (206 loc) · 10.4 KB

README.md

File metadata and controls

275 lines (206 loc) · 10.4 KB

About

Data exists everywhere: your laptop, Postgres, Snowflake and as files in S3. It exists in various formats such as Parquet, CSV and JSON. Regardless, there will always be multiple steps spanning several destinations to get the insights you need.

GlareDB is designed to query your data wherever it lives using SQL that you already know.

Install

Install/update glaredb in the current directory:

curl https://glaredb.com/install.sh | sh

It may be helpful to install the binary in a location on your PATH. For example, ~/.local/bin.

If you prefer manual installation, download, extract and run the GlareDB binary from a release in our releases page.

Getting Started

After Installing, get up and running with:

Local CLI

To start a local session, run the binary:

./glaredb

Or, you can execute SQL and immediately return (try it out!):

# Query a CSV on Hugging Face
./glaredb --query "SELECT * FROM \
'https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/raw/main/prompts.csv';"

To see all options use --help:

./glaredb --help

Hybrid Execution

  1. Sign up at https://console.glaredb.com for a free fully-managed deployment of GlareDB

  2. Copy the connection string from GlareDB Cloud, for example:

    ./glaredb --cloud-url="glaredb://user:pass@host:port/deployment"
    # or
    ./glaredb
    > \open "glaredb://user:pass@host:port/deployment

Read our announcement on Hybrid Execution for more information.

Using GlareDB in Python

  1. Install the official GlareDB Python library

    pip install glaredb
  2. Import and use glaredb.

    import glaredb
    con = glaredb.connect()
    con.sql("select 'hello world';").show()

To use Hybrid Execution, sign up at https://console.glaredb.com and use the connection string for your deployment. For example:

import glaredb
con = glaredb.connect("glaredb://user:pass@host:port/deployment")
con.sql("select 'hello hybrid exec';").show()

GlareDB work with Pandas and Polars DataFrames out of the box:

import glaredb
import polars as pl

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)

con = glaredb.connect()

df = con.sql("select * from df where fruits = 'banana'").to_polars();

print(df)

Local Server

The server subcommand can be used to launch a server process for GlareDB:

./glaredb server

To see all options for running in server mode, use --help:

./glaredb server --help

When launched as a server process, GlareDB can be reached on port 6543 using a Postgres client. The following example uses psql to connect to a locally running server:

psql "host=localhost user=glaredb dbname=glaredb port=6543"

Configure the First Data Source

You can use a demo Postgres instance at pg.demo.glaredb.com. Adding this Postgres instance as data source is as easy as running the following command:

CREATE EXTERNAL DATABASE my_pg
    FROM postgres
    OPTIONS (
        host = 'pg.demo.glaredb.com',
        port = '5432',
        user = 'demo',
        password = 'demo',
        database = 'postgres',
    );

Once the data source has been added, it can be queried using fully qualified table names:

SELECT *
FROM my_pg.public.lineitem
WHERE l_shipdate <= date '1998-12-01' - INTERVAL '90'
LIMIT 5;

Check out the docs to learn about all supported data sources. Many data sources can be connected to the same GlareDB instance.

Done with this data source? Remove it with the following command:

DROP DATABASE my_pg;

Supported Data Sources

Source Read INSERT INTO COPY TO Table Function External Table External Database
Databases -- -- -- -- -- --
MySQL
PostgreSQL
MariaDB (via mysql)
MongoDB
Microsoft SQL Server 🚧 🚧
Snowflake 🚧 🚧
BigQuery 🚧 🚧
Cassandra/ScyllaDB 🚧 🚧
ClickHouse 🚧 🚧
Oracle 🚧 🚧 🚧 🚧 🚧 🚧
ADBC 🚧 🚧 🚧 🚧 🚧 🚧
ODBC 🚧 🚧 🚧 🚧 🚧 🚧
Database Files -- -- -- -- -- --
SQLite 🚧
Microsoft Excel 🚧 🚧
DuckDB 🚧 🚧 🚧 🚧 🚧 🚧
File Formats -- -- -- -- -- --
Apache Arrow 🚧
Apache Parquet 🚧
CSV 🚧
JSON 🚧
BSON 🚧
Apache Avro 🚧 🚧 🚧 🚧 🚧
Apache ORC 🚧 🚧 🚧 🚧 🚧
Table Formats -- -- -- -- -- --
Lance
Delta
Iceberg 🚧 🚧

✅ = Supported ➖ = Not Applicable 🚧 = Not Yet Supported

Building from Source

Building GlareDB requires Rust/Cargo to be installed. Check out rustup for an easy way to install Rust on your system.

Running the following command will build a release binary:

just build --release

The compiled release binary can be found in target/release/glaredb.

Documentation

Browse GlareDB documentation on our docs.glaredb.com.

Contributing

Contributions welcome! Check out CONTRIBUTING.md for how to get started.

License

See LICENSE. Unless otherwise noted, this license applies to all files in this repository.

Acknowledgements

GlareDB is proudly powered by Apache Datafusion and Apache Arrow. We are grateful for the work of the Apache Software Foundation and the community around these projects.