Simple Dockerized AsterixDB

This projects aims to make it easy to get started with AsterixDB. It is based on Docker and Docker compose. Currently, the following features are supported:

The sample cluster (which consists of one cluster controller and two node controllers) in a single container
HDFS (with name node and data nodes in their own respective containers)
External tables on S3

Starting AsterixDB without HDFS

If you do not need HDFS, you can use the docker image without docker-compose:

docker run --rm -it -p 19002:19002 -p 19006:19006 ingomuellernet/asterixdb

Starting AsterixDB with HDFS

The following should be enough to bring up all required services:

docker-compose up

Varying the number of HDFS Data Nodes

To change the number of HDFS data nodes, use the --scale flag of docker-compose:

docker-compose up --scale datanode=3

Building the Image Locally

Above command uses a pre-built docker image. If you want the image to be build locally, do the following instead:

docker-compose --file docker-compose-local.yml up

If you are behind a corporate firewall, you will have to configure Maven (which is used to build part of AsterixDB) as follows before running above command:

export MAVEN_OPTS="-Dhttp.proxyHost=your.proxy.com -Dhttp.proxyPort=3128 -Dhttps.proxyHost=your.proxy.com -Dhttps.proxyPort=3128"

Uploading Data to HDFS

The data/ folder is mounted into the HDFS namenode container, from where you can upload it using the HDFS client in that container (docker-asterixdb_namenode_1 may have a different name on your machine; run docker ps to find out):

docker exec -it docker-asterixdb_namenode_1 hadoop fs -mkdir /dataset
docker exec -it docker-asterixdb_namenode_1 hadoop fs -put /data/file.parquet /dataset/
docker exec -it docker-asterixdb_namenode_1 hadoop fs -ls /dataset

Running Queries

Once started, you should be able to use the server by accssing http://localhost:19006. Alternatively, the REST API is accessible on the standard port.

Creating an External Table on HDFS

Suppose you have the following file test.json:

{"s": "hello world", "i": 42}

Upload it to /dataset/test.json on HDFS as described above. Then run the following in the web interface:

CREATE TYPE t1 AS OPEN {};

CREATE EXTERNAL DATASET TestHdfs(t1)
USING hdfs (
  ("hdfs"="hdfs://namenode:8020"),
  ("path"="/dataset/test.json"),
  ("input-format"="text-input-format"),
  ("format"="json")
);

Creating an External Table on S3

Suppose you have stored test.json from above on s3://my-bucket-name/test.json in the eu-west-1 region on S3. Then the following command creates an external table based on that file:

CREATE TYPE t1 AS OPEN {};  -- omit if type exists already

CREATE EXTERNAL DATASET TestS3(t1)
USING S3 (
  ("accessKeyId"="YOURACCESSKEYID"),
  ("secretAccessKey"="YOURSECRETACCESSKEY"),
  ("region"="eu-west-1"),
  ("serviceEndpoint"="https://s3.eu-west-1.amazonaws.com:443"),
  ("container"="my-bucket-name"),
  ("definition"="test.json"),
  ("format"="json")
);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simple Dockerized AsterixDB

Starting AsterixDB without HDFS

Starting AsterixDB with HDFS

Varying the number of HDFS Data Nodes

Building the Image Locally

Uploading Data to HDFS

Running Queries

Creating an External Table on HDFS

Creating an External Table on S3

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simple Dockerized AsterixDB

Starting AsterixDB without HDFS

Starting AsterixDB with HDFS

Varying the number of HDFS Data Nodes

Building the Image Locally

Uploading Data to HDFS

Running Queries

Creating an External Table on HDFS

Creating an External Table on S3