Skip to content

Commit

Permalink
Documentation: Improve structure and layout
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed May 8, 2024
1 parent 862ffdb commit 5d40a6c
Show file tree
Hide file tree
Showing 13 changed files with 240 additions and 115 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Contributions of all kinds are much welcome, in order to make it more solid,
and to add features.

Breaking changes should be expected until a 1.0 release, so version pinning is
strongly recommended, especially when you use it as a library.
strongly recommended, especially when using it as a library.


## Install
Expand All @@ -53,12 +53,13 @@ pip install --upgrade cratedb-toolkit

Verify installation.
```shell
cratedb-toolkit --version
ctk --version
```

Run with Docker.
```shell
docker run --rm "ghcr.io/crate-workbench/cratedb-toolkit" cratedb-toolkit --version
alias ctk="docker run --rm "ghcr.io/crate-workbench/cratedb-toolkit" ctk"
ctk --version
```


Expand Down
20 changes: 0 additions & 20 deletions cratedb_toolkit/datasets/README.md

This file was deleted.

66 changes: 66 additions & 0 deletions doc/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Datasets API

Provide access to datasets, to be easily consumed by tutorials
and/or production applications.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[datasets]'
```

## Synopsis

```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("tutorial/weather-basic")
print(dataset.ddl)
```

## Usage

### Built-in datasets
Load an example dataset into a CrateDB database table.
```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("tutorial/weather-basic")
dataset.dbtable(dburi="crate://crate@localhost/", table="weather_data").load()
```

### Kaggle
For accessing datasets on Kaggle, you will need an account on their platform.

#### Authentication
Either create a configuration file `~/.kaggle/kaggle.json` in JSON format,
```json
{"username":"acme","key":"134af98bdb0bd0fa92078d9c37ac8f78"}
```
or, alternatively, use those environment variables.
```shell
export KAGGLE_USERNAME=acme
export KAGGLE_KEY=134af98bdb0bd0fa92078d9c37ac8f78
```

#### Acquisition
Load a dataset on Kaggle into a CrateDB database table.
```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("kaggle://guillemservera/global-daily-climate-data/daily_weather.parquet")
dataset.dbtable(dburi="crate://crate@localhost/", table="kaggle_daily_weather").load()
```


## In Practice

Please refer to those notebooks to learn how `load_dataset` works in practice.

- [How to Build Time Series Applications in CrateDB]
- [Exploratory data analysis with CrateDB]
- [Time series decomposition with CrateDB]


[Exploratory data analysis with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb
[How to Build Time Series Applications in CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb
[Time series decomposition with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb
10 changes: 2 additions & 8 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@ about a possible feature.
:hidden:
install
io/index
retention
datasets
```

```{toctree}
Expand All @@ -103,14 +105,6 @@ backlog
```



[Changelog]: https://github.com/crate-workbench/cratedb-toolkit/blob/main/CHANGES.md
[development documentation]: https://cratedb-toolkit.readthedocs.io/sandbox.html
[Documentation]: https://cratedb-toolkit.readthedocs.io/
[Issues]: https://github.com/crate-workbench/cratedb-toolkit/issues
[License]: https://github.com/crate-workbench/cratedb-toolkit/blob/main/LICENSE
[PyPI]: https://pypi.org/project/cratedb-toolkit/
[Source code]: https://github.com/crate-workbench/cratedb-toolkit
[cratedb-toolkit]: https://cratedb-toolkit.readthedocs.io/
[influxio]: https://influxio.readthedocs.io/

Expand Down
84 changes: 14 additions & 70 deletions cratedb_toolkit/io/README.md → doc/io/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Load and extract data into/from CrateDB
# I/O Subsystem

Load and extract data into/from CrateDB.

## About
Using the InfluxDB and MongoDB I/O subsystems, you can transfer data from
[InfluxDB] and [MongoDB] to [CrateDB] and [CrateDB Cloud].

A one-stop command `ctk load table` to load data into CrateDB database tables.
## What's inside
A one-stop command `ctk load table` to load data into database tables.


## Installation
Expand Down Expand Up @@ -78,76 +82,16 @@ ctk shell --command="SELECT * FROM data_weather LIMIT 10;" --format=json
- Exercise data imports from AWS S3 and other Object Storage providers.


## InfluxDB
```{toctree}
:maxdepth: 2
:hidden:
Using the adapter to [influxio], you can transfer data from InfluxDB to CrateDB.

Import two data points into InfluxDB.
```shell
export INFLUX_ORG=example
export INFLUX_TOKEN=token
export INFLUX_BUCKET_NAME=testdrive
export INFLUX_MEASUREMENT=demo
influx bucket create
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=42.42,humidity=84.84 1556896326"
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=45.89,humidity=77.23,windspeed=5.4 1556896327"
influx query "from(bucket:\"${INFLUX_BUCKET_NAME}\") |> range(start:-100y)"
```

Transfer data.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table influxdb2://example:token@localhost:8086/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
```

Todo: More convenient table querying.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```


## MongoDB

Using the MongoDB subsystem, you can transfer data from MongoDB to CrateDB.

Import two data points into MongoDB.
```shell
mongosh mongodb://localhost:27017/testdrive <<EOF
db.demo.remove({})
db.demo.insertMany([
{
timestamp: new Date(1556896326),
region: "amazonas",
temperature: 42.42,
humidity: 84.84,
},
{
timestamp: new Date(1556896327),
region: "amazonas",
temperature: 45.89,
humidity: 77.23,
windspeed: 5.4,
},
])
db.demo.find({})
EOF
```

Todo: Use `mongoimport`.
```shell
mongoimport --uri 'mongodb+srv://MYUSERNAME:[email protected]/test?retryWrites=true&w=majority'
```

Transfer data.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table mongodb://localhost:27017/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
InfluxDB <influxdb/index>
MongoDB <mongodb/index>
```


[CrateDB]: https://github.com/crate/crate
[CrateDB Cloud]: https://console.cratedb.cloud/
[influxio]: https://github.com/daq-tools/influxio
[InfluxDB]: https://github.com/influxdata/influxdb
[MongoDB]: https://github.com/mongodb/mongo
12 changes: 12 additions & 0 deletions doc/io/influxdb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
(influxdb)=
# InfluxDB I/O Subsystem

## About
Import and export data into/from InfluxDB, for humans and machines.


```{toctree}
:maxdepth: 1
loader
```
52 changes: 52 additions & 0 deletions doc/io/influxdb/loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
(influxdb-loader)=
# InfluxDB Table Loader

## About
Load data from InfluxDB into CrateDB using a one-stop command
`ctk load table influxdb2://...`, in order to facilitate convenient
data transfers to be used within data pipelines or ad hoc operations.

## Details
The InfluxDB table loader is based on the [influxio] package. Please also check
its documentation to learn about more of its capabilities, supporting you when
working with InfluxDB.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[influxdb]'
```

## Example
Import two data points into InfluxDB.

```shell
export INFLUX_ORG=example
export INFLUX_TOKEN=token
export INFLUX_BUCKET_NAME=testdrive
export INFLUX_MEASUREMENT=demo
influx bucket create
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=42.42,humidity=84.84 1556896326"
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=45.89,humidity=77.23,windspeed=5.4 1556896327"
influx query "from(bucket:\"${INFLUX_BUCKET_NAME}\") |> range(start:-100y)"
```

Transfer data from InfluxDB bucket/measurement into CrateDB schema/table.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table influxdb2://example:token@localhost:8086/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
```

Query data in CrateDB.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```

:::{todo}
- More convenient table querying.
:::


[influxio]: inv:influxio:*:label#index
13 changes: 13 additions & 0 deletions doc/io/mongodb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
(mongodb)=
# MongoDB I/O Subsystem

## About
Using the MongoDB subsystem, you can transfer data from and to MongoDB.


```{toctree}
:maxdepth: 1
loader
migr8
```
58 changes: 58 additions & 0 deletions doc/io/mongodb/loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
(mongodb-loader)=
# MongoDB Table Loader

## About
Load data from MongoDB into CrateDB using a one-stop command
`ctk load table mongodb://...`, in order to facilitate convenient
data transfers to be used within data pipelines or ad hoc operations.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[mongodb]'
```

## Example
Import two data points into MongoDB.

```shell
mongosh mongodb://localhost:27017/testdrive <<EOF
db.demo.remove({})
db.demo.insertMany([
{
timestamp: new Date(1556896326),
region: "amazonas",
temperature: 42.42,
humidity: 84.84,
},
{
timestamp: new Date(1556896327),
region: "amazonas",
temperature: 45.89,
humidity: 77.23,
windspeed: 5.4,
},
])
db.demo.find({})
EOF
```

Transfer data from MongoDB database/collection into CrateDB schema/table.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table mongodb://localhost:27017/testdrive/demo
```

Query data in CrateDB.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```


:::{todo}
Use `mongoimport`.
```shell
mongoimport --uri 'mongodb+srv://MYUSERNAME:[email protected]/test?retryWrites=true&w=majority'
```
:::
14 changes: 8 additions & 6 deletions cratedb_toolkit/io/mongodb/README.md → doc/io/mongodb/migr8.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# MongoDB → CrateDB Migration Tool
(migr8)=
# migr8

## About

A utility program, called `migr8`, supporting data migrations
between MongoDB and CrateDB.

A one-stop command `ctk load table mongodb://...`, wrapping the `migr8`
steps into a complete pipeline, to facilitate convenient data transfers.


## About
:::{tip}
Please also visit the documentation about the [](#mongodb-loader)
to learn about a more high-level interface.
:::

### Details

Expand Down
Loading

0 comments on commit 5d40a6c

Please sign in to comment.