Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Improve structure and layout #146

Merged
merged 1 commit into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Contributions of all kinds are much welcome, in order to make it more solid,
and to add features.

Breaking changes should be expected until a 1.0 release, so version pinning is
strongly recommended, especially when you use it as a library.
strongly recommended, especially when using it as a library.


## Install
Expand All @@ -53,12 +53,13 @@ pip install --upgrade cratedb-toolkit

Verify installation.
```shell
cratedb-toolkit --version
ctk --version
```

Run with Docker.
```shell
docker run --rm "ghcr.io/crate-workbench/cratedb-toolkit" cratedb-toolkit --version
alias ctk="docker run --rm "ghcr.io/crate-workbench/cratedb-toolkit" ctk"
ctk --version
```


Expand Down
20 changes: 0 additions & 20 deletions cratedb_toolkit/datasets/README.md

This file was deleted.

66 changes: 66 additions & 0 deletions doc/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Datasets API

Provide access to datasets, to be easily consumed by tutorials
and/or production applications.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[datasets]'
```

## Synopsis

```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("tutorial/weather-basic")
print(dataset.ddl)
```

## Usage

### Built-in datasets
Load an example dataset into a CrateDB database table.
```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("tutorial/weather-basic")
dataset.dbtable(dburi="crate://crate@localhost/", table="weather_data").load()
```

### Kaggle
For accessing datasets on Kaggle, you will need an account on their platform.

#### Authentication
Either create a configuration file `~/.kaggle/kaggle.json` in JSON format,
```json
{"username":"acme","key":"134af98bdb0bd0fa92078d9c37ac8f78"}
```
or, alternatively, use those environment variables.
```shell
export KAGGLE_USERNAME=acme
export KAGGLE_KEY=134af98bdb0bd0fa92078d9c37ac8f78
```

#### Acquisition
Load a dataset on Kaggle into a CrateDB database table.
```python
from cratedb_toolkit.datasets import load_dataset

dataset = load_dataset("kaggle://guillemservera/global-daily-climate-data/daily_weather.parquet")
dataset.dbtable(dburi="crate://crate@localhost/", table="kaggle_daily_weather").load()
```


## In Practice

Please refer to those notebooks to learn how `load_dataset` works in practice.

- [How to Build Time Series Applications in CrateDB]
- [Exploratory data analysis with CrateDB]
- [Time series decomposition with CrateDB]


[Exploratory data analysis with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb
[How to Build Time Series Applications in CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb
[Time series decomposition with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb
10 changes: 2 additions & 8 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,9 @@ about a possible feature.
:hidden:

install
io/index
retention
datasets
```

```{toctree}
Expand All @@ -103,14 +105,6 @@ backlog
```



[Changelog]: https://github.com/crate-workbench/cratedb-toolkit/blob/main/CHANGES.md
[development documentation]: https://cratedb-toolkit.readthedocs.io/sandbox.html
[Documentation]: https://cratedb-toolkit.readthedocs.io/
[Issues]: https://github.com/crate-workbench/cratedb-toolkit/issues
[License]: https://github.com/crate-workbench/cratedb-toolkit/blob/main/LICENSE
[PyPI]: https://pypi.org/project/cratedb-toolkit/
[Source code]: https://github.com/crate-workbench/cratedb-toolkit
[cratedb-toolkit]: https://cratedb-toolkit.readthedocs.io/
[influxio]: https://influxio.readthedocs.io/

Expand Down
84 changes: 14 additions & 70 deletions cratedb_toolkit/io/README.md → doc/io/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Load and extract data into/from CrateDB
# I/O Subsystem

Load and extract data into/from CrateDB.

## About
Using the InfluxDB and MongoDB I/O subsystems, you can transfer data from
[InfluxDB] and [MongoDB] to [CrateDB] and [CrateDB Cloud].

A one-stop command `ctk load table` to load data into CrateDB database tables.
## What's inside
A one-stop command `ctk load table` to load data into database tables.


## Installation
Expand Down Expand Up @@ -78,76 +82,16 @@ ctk shell --command="SELECT * FROM data_weather LIMIT 10;" --format=json
- Exercise data imports from AWS S3 and other Object Storage providers.


## InfluxDB
```{toctree}
:maxdepth: 2
:hidden:

Using the adapter to [influxio], you can transfer data from InfluxDB to CrateDB.

Import two data points into InfluxDB.
```shell
export INFLUX_ORG=example
export INFLUX_TOKEN=token
export INFLUX_BUCKET_NAME=testdrive
export INFLUX_MEASUREMENT=demo
influx bucket create
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=42.42,humidity=84.84 1556896326"
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=45.89,humidity=77.23,windspeed=5.4 1556896327"
influx query "from(bucket:\"${INFLUX_BUCKET_NAME}\") |> range(start:-100y)"
```

Transfer data.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table influxdb2://example:token@localhost:8086/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
```

Todo: More convenient table querying.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```


## MongoDB

Using the MongoDB subsystem, you can transfer data from MongoDB to CrateDB.

Import two data points into MongoDB.
```shell
mongosh mongodb://localhost:27017/testdrive <<EOF
db.demo.remove({})
db.demo.insertMany([
{
timestamp: new Date(1556896326),
region: "amazonas",
temperature: 42.42,
humidity: 84.84,
},
{
timestamp: new Date(1556896327),
region: "amazonas",
temperature: 45.89,
humidity: 77.23,
windspeed: 5.4,
},
])
db.demo.find({})
EOF
```

Todo: Use `mongoimport`.
```shell
mongoimport --uri 'mongodb+srv://MYUSERNAME:[email protected]/test?retryWrites=true&w=majority'
```

Transfer data.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table mongodb://localhost:27017/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
InfluxDB <influxdb/index>
MongoDB <mongodb/index>
```


[CrateDB]: https://github.com/crate/crate
[CrateDB Cloud]: https://console.cratedb.cloud/
[influxio]: https://github.com/daq-tools/influxio
[InfluxDB]: https://github.com/influxdata/influxdb
[MongoDB]: https://github.com/mongodb/mongo
12 changes: 12 additions & 0 deletions doc/io/influxdb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
(influxdb)=
# InfluxDB I/O Subsystem

## About
Import and export data into/from InfluxDB, for humans and machines.


```{toctree}
:maxdepth: 1

loader
```
52 changes: 52 additions & 0 deletions doc/io/influxdb/loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
(influxdb-loader)=
# InfluxDB Table Loader

## About
Load data from InfluxDB into CrateDB using a one-stop command
`ctk load table influxdb2://...`, in order to facilitate convenient
data transfers to be used within data pipelines or ad hoc operations.

## Details
The InfluxDB table loader is based on the [influxio] package. Please also check
its documentation to learn about more of its capabilities, supporting you when
working with InfluxDB.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[influxdb]'
```

## Example
Import two data points into InfluxDB.

```shell
export INFLUX_ORG=example
export INFLUX_TOKEN=token
export INFLUX_BUCKET_NAME=testdrive
export INFLUX_MEASUREMENT=demo
influx bucket create
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=42.42,humidity=84.84 1556896326"
influx write --precision=s "${INFLUX_MEASUREMENT},region=amazonas temperature=45.89,humidity=77.23,windspeed=5.4 1556896327"
influx query "from(bucket:\"${INFLUX_BUCKET_NAME}\") |> range(start:-100y)"
```

Transfer data from InfluxDB bucket/measurement into CrateDB schema/table.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table influxdb2://example:token@localhost:8086/testdrive/demo
crash --command "SELECT * FROM testdrive.demo;"
```

Query data in CrateDB.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```

:::{todo}
- More convenient table querying.
:::


[influxio]: inv:influxio:*:label#index
13 changes: 13 additions & 0 deletions doc/io/mongodb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
(mongodb)=
# MongoDB I/O Subsystem

## About
Using the MongoDB subsystem, you can transfer data from and to MongoDB.


```{toctree}
:maxdepth: 1

loader
migr8
```
58 changes: 58 additions & 0 deletions doc/io/mongodb/loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
(mongodb-loader)=
# MongoDB Table Loader

## About
Load data from MongoDB into CrateDB using a one-stop command
`ctk load table mongodb://...`, in order to facilitate convenient
data transfers to be used within data pipelines or ad hoc operations.

## Install
```shell
pip install --upgrade 'cratedb-toolkit[mongodb]'
```

## Example
Import two data points into MongoDB.

```shell
mongosh mongodb://localhost:27017/testdrive <<EOF
db.demo.remove({})
db.demo.insertMany([
{
timestamp: new Date(1556896326),
region: "amazonas",
temperature: 42.42,
humidity: 84.84,
},
{
timestamp: new Date(1556896327),
region: "amazonas",
temperature: 45.89,
humidity: 77.23,
windspeed: 5.4,
},
])
db.demo.find({})
EOF
```

Transfer data from MongoDB database/collection into CrateDB schema/table.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table mongodb://localhost:27017/testdrive/demo
```

Query data in CrateDB.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk shell --command "SELECT * FROM testdrive.demo;"
ctk show table "testdrive.demo"
```


:::{todo}
Use `mongoimport`.
```shell
mongoimport --uri 'mongodb+srv://MYUSERNAME:[email protected]/test?retryWrites=true&w=majority'
```
:::
14 changes: 8 additions & 6 deletions cratedb_toolkit/io/mongodb/README.md → doc/io/mongodb/migr8.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# MongoDB → CrateDB Migration Tool
(migr8)=
# migr8

## About

A utility program, called `migr8`, supporting data migrations
between MongoDB and CrateDB.

A one-stop command `ctk load table mongodb://...`, wrapping the `migr8`
steps into a complete pipeline, to facilitate convenient data transfers.


## About
:::{tip}
Please also visit the documentation about the [](#mongodb-loader)
to learn about a more high-level interface.
:::

### Details

Expand Down
Loading