Skip to content

Commit

Permalink
initial formatting done
Browse files Browse the repository at this point in the history
  • Loading branch information
gibbscullen committed May 8, 2020
1 parent 21dc3af commit 0167857
Show file tree
Hide file tree
Showing 33 changed files with 1,224 additions and 345 deletions.
6 changes: 4 additions & 2 deletions docs-beta/content/about_m3/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ chapter = true
pre = "<b>2. </b>"
+++

### Section 1
### About M3

# Overview
#### Contributing to the Project
#### Glossary
#### Release notes


81 changes: 54 additions & 27 deletions docs-beta/content/about_m3/contributing.md

Large diffs are not rendered by default.

47 changes: 30 additions & 17 deletions docs-beta/content/about_m3/glossary.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,35 @@
---
title: "I. Glossary"
title: "Glossary"
date: 2020-04-21T20:45:40-04:00
draft: true
---

Glossary
Bootstrapping: Process by which an M3DB node is brought up. Bootstrapping consists of determining the integrity of data that the node has, replay writes from the commit log, and/or stream missing data from its peers.
Cardinality: The number of unique metrics within the M3DB index. Cardinality increases with the number of unique tag/value combinations that are being emitted.
Datapoint: A single timestamp/value. Timeseries are composed of multiple datapoints and a series of tag/value pairs.
Labels: Pairs of descriptive words that give meaning to a metric. Tags and Labels are interchangeable terms.
Metric: A collection of uniquely identifiable tags.
M3: Highly scalable, distributed metrics platform that is comprised of a native, distributed time series database, a highly-dynamic and performant aggregation service, a query engine, and other supporting infrastructure.
M3Coordinator: A service within M3 that coordinates reads and writes between upstream systems, such as Prometheus, and downstream systems, such as M3DB.
M3DB: Distributed time series database influenced by Gorilla and Cassandra released as open source by Uber Technologies.
M3Query: A distributed query engine for M3DB. Unlike M3Coordinator, M3Query only provides supports for reads.
Namespace: Similar to a table in other types of databases, namespaces in M3DB have a unique name and a set of configuration options, such as data retention and block size.
Placement: Map of the M3DB cluster's shard replicas to nodes. Each M3DB cluster has only one placement. Placement and Topology are interchangeable terms.
Shard: Effectively the same as a "virtual shard" in Cassandra in that it provides an arbitrary distribution of time series data via a simple hash of the series ID.
Tags: Pairs of descriptive words that give meaning to a metric. Tags and Labels are interchangeable terms.
Timeseries: A series of data points tracking a particular metric over time.
Topology: Map of the M3DB cluster's shard replicas to nodes. Each M3DB cluster has only one placement. Placement and Topology are interchangeable terms.
1. **Bootstrapping:** Process by which an M3DB node is brought up. Bootstrapping consists of determining the integrity of data that the node has, replay writes from the commit log, and/or stream missing data from its peers.

2. **Cardinality:** The number of unique metrics within the M3DB index. Cardinality increases with the number of unique tag/value combinations that are being emitted.

3. **Datapoint:** A single timestamp/value. Timeseries are composed of multiple datapoints and a series of tag/value pairs.

4. **Labels:** Pairs of descriptive words that give meaning to a metric. Tags and Labels are interchangeable terms.

5. **Metric:** A collection of uniquely identifiable tags.

6. **M3:** Highly scalable, distributed metrics platform that is comprised of a native, distributed time series database, a highly-dynamic and performant aggregation service, a query engine, and other supporting infrastructure.

7. **M3Coordinator:** A service within M3 that coordinates reads and writes between upstream systems, such as Prometheus, and downstream systems, such as M3DB.

8. **M3DB:** Distributed time series database influenced by Gorilla and Cassandra released as open source by Uber Technologies.

9. **M3Query:** A distributed query engine for M3DB. Unlike M3Coordinator, M3Query only provides supports for reads.

10. **Namespace:** Similar to a table in other types of databases, namespaces in M3DB have a unique name and a set of configuration options, such as data retention and block size.

11. **Placement:** Map of the M3DB cluster's shard replicas to nodes. Each M3DB cluster has only one placement. Placement and Topology are interchangeable terms.

12. **Shard:** Effectively the same as a "virtual shard" in Cassandra in that it provides an arbitrary distribution of time series data via a simple hash of the series ID.

13. **Tags:** Pairs of descriptive words that give meaning to a metric. Tags and Labels are interchangeable terms.

14. **Timeseries:** A series of data points tracking a particular metric over time.

15. **Topology:** Map of the M3DB cluster's shard replicas to nodes. Each M3DB cluster has only one placement. Placement and Topology are interchangeable terms.
2 changes: 1 addition & 1 deletion docs-beta/content/about_m3/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "II. Release notes"
title: "Release notes"
date: 2020-04-21T20:45:33-04:00
draft: true
---
Expand Down
2 changes: 0 additions & 2 deletions docs-beta/content/contact/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ Email

Slack

Gitter

GitHub

LinkedIn
Expand Down
15 changes: 11 additions & 4 deletions docs-beta/content/getting_started/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,30 @@ date: 2020-04-21T20:47:48-04:00
draft: true
---

Docker & Kernel Configuration
### Docker & Kernel Configuration

This document lists the Kernel tweaks M3DB needs to run well. If you are running on Kubernetes, you may use our sysctl-setter DaemonSet that will set these values for you. Please read the comment in that manifest to understand the implications of applying it.
Running with Docker

### Running with Docker
When running M3DB inside Docker, it is recommended to add the SYS_RESOURCE capability to the container (using the --cap-add argument to docker run) so that it can raise its file limits:
docker run --cap-add SYS_RESOURCE quay.io/m3/m3dbnode:latest

If M3DB is being run as a non-root user, M3's setcap images are required:
docker run --cap-add SYS_RESOURCE -u 1000:1000 quay.io/m3/m3dbnode:latest-setcap

More information on Docker's capability settings can be found here.
vm.max_map_count

#### vm.max_map_count
M3DB uses a lot of mmap-ed files for performance, as a result, you might need to bump vm.max_map_count. We suggest setting this value to 3000000, so you don’t have to come back and debug issues later.
On Linux, you can increase the limits by running the following command as root:
sysctl -w vm.max_map_count=3000000

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf.
vm.swappiness

#### vm.swappiness
vm.swappiness controls how much the virtual memory subsystem will try to swap to disk. By default, the kernel configures this value to 60, and will try to swap out items in memory even when there is plenty of RAM available to the system.
We recommend sizing clusters such that M3DB is running on a substrate (hosts/containers) such that no-swapping is necessary, i.e. the process is only using 30-50% of the maximum available memory. And therefore recommend setting the value of vm.swappiness to 1. This tells the kernel to swap as little as possible, without altogether disabling swapping.

On Linux, you can configure this by running the following as root:
sysctl -w vm.swappiness=1

Expand All @@ -41,6 +46,8 @@ sysctl -w fs.file-max=3000000
sysctl -w fs.nr_open=3000000

To set these values permanently, update the fs.file-max and fs.nr_open settings in /etc/sysctl.conf.

Alternatively, if you wish to have M3DB run under systemd you can use our service example which will set sane defaults. Keep in mind that you'll still need to configure the kernel and process limits because systemd will not allow a process to exceed them and will silently fallback to a default value which could cause M3DB to crash due to hitting the file descriptor limit. Also note that systemd has a system.conf file and a user.conf file which may contain limits that the service-specific configuration files cannot override. Be sure to check that those files aren't configured with values lower than the value you configure at the service level.

Before running the process make sure the limits are set, if running manually you can raise the limit for the current user with ulimit -n 3000000.

26 changes: 15 additions & 11 deletions docs-beta/content/getting_started/kube.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,30 @@ date: 2020-04-21T20:47:43-04:00
draft: true
---

M3DB on Kubernetes
Please note: If possible PLEASE USE THE OPERATOR to deploy to Kubernetes if you can. It is a considerly more streamlined setup.
### M3DB on Kubernetes

**Please note:** If possible PLEASE USE THE OPERATOR to deploy to Kubernetes if you can. It is a considerly more streamlined setup.

The operator leverages custom resource definitions (CRDs) to automatically handle operations such as managing cluster topology.
The guide below provides static manifests to bootstrap a cluster on Kubernetes and should be considered as a guide to running M3 on Kubernetes, if and only if you have significant custom requirements not satisified by the operator.
Prerequisites

**Prerequisites**
M3DB performs better when it has access to fast disks. Every incoming write is written to a commit log, which at high volumes of writes can be sensitive to spikes in disk latency. Additionally the random seeks into files when loading cold files benefit from lower random read latency.
Because of this, the included manifests reference a StorageClass named fast. Manifests are provided to provide such a StorageClass on AWS / Azure / GCP using the respective cloud provider's premium disk class.
If you do not already have a StorageClass named fast, create one using one of the provided manifests:
# AWS EBS (class io1)
#### AWS EBS (class io1)
kubectl apply -f https://raw.githubusercontent.com/m3db/m3/master/kube/storage-fast-aws.yaml

# Azure premium LRS
#### Azure premium LRS
kubectl apply -f https://raw.githubusercontent.com/m3db/m3/master/kube/storage-fast-azure.yaml

# GCE Persistent SSD
#### GCE Persistent SSD
kubectl apply -f https://raw.githubusercontent.com/m3db/m3/master/kube/storage-fast-gcp.yaml

If you wish to use your cloud provider's default remote disk, or another disk class entirely, you'll have to modify them manifests.
If your Kubernetes cluster spans multiple availability zones, it's important to specify a Volume Binding Mode of WaitForFirstConsumer in your StorageClass to delay the binding of the PersistentVolume until the Pod is created.
Kernel Configuration

**Kernel Configuration**
We provide a Kubernetes daemonset that can make setting host-level sysctls easier. Please see the kernel docs for more.
Note that our default StatefulSet spec will give the M3DB container CAP_SYS_RESOURCE so it may raise its file limits. Uncomment the securityContext on the m3db container in the StatefulSet if running with a Pod Security Policy or similar enforcement mechanism that prevents adding capabilities to containers.
Deploying
Expand All @@ -46,12 +50,12 @@ m3dbnode-1 1/1 Running 0 22m
m3dbnode-2 1/1 Running 0 22m

You can now proceed to initialize a namespace and placement for the cluster the same as you would for our other how-to guides:
# Open a local connection to the coordinator service:
#### Open a local connection to the coordinator service:
$ kubectl -n m3db port-forward svc/m3coordinator 7201
Forwarding from 127.0.0.1:7201 -> 7201
Forwarding from [::1]:7201 -> 7201

# Create an initial cluster topology
#### Create an initial cluster topology
curl -sSf -X POST localhost:7201/api/v1/placement/init -d '{
"num_shards": 1024,
"replication_factor": 3,
Expand Down Expand Up @@ -86,7 +90,7 @@ curl -sSf -X POST localhost:7201/api/v1/placement/init -d '{
]
}'

# Create a namespace to hold your metrics
#### Create a namespace to hold your metrics
curl -X POST localhost:7201/api/v1/namespace -d '{
"name": "default",
"options": {
Expand Down Expand Up @@ -189,7 +193,7 @@ $ curl -sSf -X POST http://localhost:9003/query -d '{
"exhaustive": true
}

Adding nodes
#### Adding nodes
You can easily scale your M3DB cluster by scaling the StatefulSet and informing the cluster topology of the change:
kubectl -n m3db scale --replicas=4 statefulset/m3dbnode

Expand Down
33 changes: 23 additions & 10 deletions docs-beta/content/getting_started/m3_binary.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,39 @@ date: 2020-04-21T20:47:36-04:00
draft: true
---

M3DB Cluster Deployment, Manually (The Hard Way)
Introduction
### M3DB Cluster Deployment, Manually (The Hard Way)

#### Introduction
This document lists the manual steps involved in deploying a M3DB cluster. In practice, you'd be automating this using Terraform or using Kubernetes rather than doing this by hand; guides for doing so are available under the How-To section.

Primer Architecture
A quick primer on M3DB architecture. Here’s what a typical deployment looks like:

A few different things to highlight about the diagram:
Role Type

**Role Type**
There are three ‘role types’ for a m3db deployment -
Coordinator: m3coordinator serves to coordinate reads and writes across all hosts in the cluster. It’s a lightweight process, and does not store any data. This role would typically be run alongside a Prometheus instance, or be baked into a collector agent.
Storage Node: m3dbnode processes running on these hosts are the workhorses of the database, they store data; and serve reads and writes.
Seed Node: First and foremost, these hosts are storage nodes themselves. In addition to that responsibility, they run an embedded ETCD server. This is to allow the various M3DB processes running across the cluster to reason about the topology/configuration of the cluster in a consistent manner.

**Coordinator:** m3coordinator serves to coordinate reads and writes across all hosts in the cluster. It’s a lightweight process, and does not store any data. This role would typically be run alongside a Prometheus instance, or be baked into a collector agent.

**Storage Node:** m3dbnode processes running on these hosts are the workhorses of the database, they store data; and serve reads and writes.

**Seed Node:** First and foremost, these hosts are storage nodes themselves. In addition to that responsibility, they run an embedded ETCD server. This is to allow the various M3DB processes running across the cluster to reason about the topology/configuration of the cluster in a consistent manner.
Note: In very large deployments, you’d use a dedicated ETCD cluster, and only use M3DB Storage and Coordinator Nodes
Provisioning

#### Provisioning
Enough background, lets get you going with a real cluster! Provision your host (be it VMs from AWS/GCP/etc) or bare-metal servers in your DC with the latest and greatest flavour of Linux you favor. M3DB works on all popular distributions - Ubuntu/RHEL/CentOS, let us know if you run into issues on another platform and we’ll be happy to assist.
Network

#### Network
If you’re using AWS or GCP it is highly advised to use static IPs so that if you need to replace a host, you don’t have to update your configuration files on all the hosts, you simply decomission the old seed node and provision a new seed node with the same host ID and static IP that the old seed node had. For AWS you can use a Elastic Network Interface on a VPC and for GCP you can simply use an internal static IP address.

In this example you will be creating three static IP addresses for the three seed nodes.
Further, we assume you have hostnames configured correctly too. i.e. running hostname on a host in the cluster returns the host ID you'll be using when specifying instance host IDs when creating the M3DB cluster placement. E.g. running hostname on a node m3db001 should return it's host ID m3db001.

In GCP the name of your instance when you create it will automatically be it's hostname. When you create an instance click "Management, disks, networking, SSH keys" and under "Networking" click the default interface and click the "Primary internal IP" drop down and select "Reserve a static internal IP address" and give it a name, i.e. m3db001, a description that describes it's a seed node IP address and use "Assign automatically".

In AWS it might be simpler to just use whatever the hostname you get for the provisioned VM as your host ID when specifying M3DB placement. Either that or use the environment host ID resolver and pass your host ID when launching the database process with an environment variable. You can set to the host ID and specify the environment variable name in config as envVarName: M3DB_HOST_ID if you are using an environment variable named M3DB_HOST_ID.

Relevant config snippet:
hostID:
resolver: environment
Expand All @@ -33,9 +45,10 @@ hostID:
Then start your process with:
M3DB_HOST_ID=m3db001 m3dbnode -f config.yml

Kernel
#### Kernel
Ensure you review our recommended kernel configuration before running M3DB in production as M3DB may exceed the default limits for some default kernel values.
Config files

#### Config files
We wouldn’t feel right to call this guide, “The Hard Way” and not require you to change some configs by hand.
Note: the steps that follow assume you have the following 3 seed nodes - make necessary adjustment if you have more or are using a dedicated ETCD cluster. Example seed nodes:
m3db001 (Region=us-east1, Zone=us-east1-a, Static IP=10.142.0.1)
Expand Down
Loading

0 comments on commit 0167857

Please sign in to comment.