Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Partial][reorg] Merge libtsdb #3

Merged
merged 11 commits into from
Jul 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 7 additions & 15 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,16 +1,8 @@
# Node rules:
## Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
.grunt
# IDE Start
.idea
.vscode
# IDE End

## Dependency directory
## Commenting this out is preferred by some people, see
## https://docs.npmjs.com/misc/faq#should-i-check-my-node_modules-folder-into-git
node_modules

# Book build output
_book

# eBook build output
*.epub
*.mobi
*.pdf
# Example Start
*.out
# Example End
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2017

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
52 changes: 15 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,21 @@
# Notes on Time Series Database

**Deprecated**, please see related section

This a notes about the history and implementation of [various time series databases](https://github.com/xephonhq/awesome-time-series-database).
It aims to provide a deep insight of how various TSDBs work and how they evolve into current stage.
Also some related fields like mining time series data would be covered.

There are two versions:
- a [Gitbook version](https://at15.gitbooks.io/notes-on-tsdb/content/)
- a [Tex version](https://github.com/xephonhq/notes-on-tsdb/blob/master/tex).

For comparing TSDBs, you may want to try [Xephon-B](https://github.com/xephonhq/xephon-b) (still work in progress).

## Roadmap

The writing roadmap is based on my personal interest and need for course projects,
so it may not be in a well organized order.

- [ ] Basic knowledge of databases
- [ ] In memory time series databases
- [ ] compress data (delta)
- [ ] concurrency in single machine
- [ ] handling meta data
- [ ] log
- [ ] multiple machine
- [ ] Query Language
- [ ] Benchmark
- [ ] work load generation
- [ ] existing tools and pitfalls
- [ ] design of [Xephon-B](https://github.com/xephonhq/xephon-b)
- [ ] Genetic and time series
- [ ] existing genetic databases
- [ ] store genetic data in time series databases
- [ ] mining genetic data (also see mining time series data)
- [ ] Mining time series data
This repo is a work in progress [book](book) about time series database (TSDB) that (will) contains:

- [Survey](survey) on [various existing time series databases](https://github.com/xephonhq/awesome-time-series-database)
- How to write a (distributed) time series database from scratch.
- Related fields like distributed tracing, OLAP database.

Project layout

- [book](book) The WIP book
- [doc](doc) [Roadmap](doc/ROADMAP.md) and notes on writing notes
- [survey](survey) Survey on TSDB and related fields

## Related

- [at15/papers-i-read](https://github.com/at15/papers-i-read)
- [at15/code-i-read](https://github.com/at15/code-i-read)
- [at15/pub](https://github.com/at15/pub)
- [Awesome Time Series Database](https://github.com/xephonhq/awesome-time-series-database)
- [libtsdb](https://github.com/libtsdb)

## Acknowledgment

Expand All @@ -48,4 +24,6 @@ so it may not be in a well organized order.

## License

The book is licensed under create commons. The sample code (unless specified otherwise in comment) is licensed under MIT.

[CC BY-NC-SA 3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/us/)
5 changes: 0 additions & 5 deletions SUMMARY.md

This file was deleted.

12 changes: 0 additions & 12 deletions book.json

This file was deleted.

7 changes: 7 additions & 0 deletions book/01-overview/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Chapter 01: Overview

## TODO

- [ ] what is time series data
- [ ] why do we need time series database
- [ ] a glimpse on tsdb landscape (like DDA, I can draw the map by myself and put xephon-k in the far north beyond the wall)
9 changes: 9 additions & 0 deletions book/02-basic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Chapter 02: Basic

- [Primitive Data Type](primitive) Representation of primitive type like integer, float, string.
- [Endianness](primitive/endianess.md)
- [Integer](primitive/integer.md)

## TODO

- [ ] math
3 changes: 3 additions & 0 deletions book/02-basic/primitive/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Primitive Types

- [Endianness](endianness.md)
1 change: 1 addition & 0 deletions book/02-basic/primitive/code/c/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f1.bin
20 changes: 20 additions & 0 deletions book/02-basic/primitive/code/c/convert_integer.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
* project: ntsdb
* file: convert_integer.c
* chapter: basic
* section: primitive
* description: show behavior of converting signed and unsigned integer in c
* compile: gcc convert_interger.c
* run: ./a.out
*/

#include <stdio.h>

int main() {
// 1000_000 interpreted as signed is -128, i.e. -2^7 + 0
char c = -128;
// 1000_000 interpreted as unsigned is 128, i.e. 2^7
unsigned char uc = (unsigned char) c;
printf("%d %u\n", c, uc); // -128, 128
return 0;
}
33 changes: 33 additions & 0 deletions book/02-basic/primitive/code/c/endianness.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* project: ntsdb
* file: endianness.c
* chapter: basic
* section: primitive
* description: show endianess of integer
* compile: gcc endianess.c
* run: ./a.out
*/

#include <stdio.h>

// NOTE: based on CSAPP Figure 2.4

void show_bytes(unsigned char* start, size_t len) {
for (int i = 0; i < len; i++) {
printf(" %.2x", start[i]);
}
printf("\n");
}

void show_int(int x) {
show_bytes((unsigned char*) &x, sizeof(x));
}

int main() {
// 01 04 00 00
// 1025 is 2^10+1, which is 0x0401, print starts from smaller memory address, 01 is LSB and prints before 04
show_int(1025);

// 1024 2048
printf("%d %d\n", 0x0400, 0x0400 << 1);
}
40 changes: 40 additions & 0 deletions book/02-basic/primitive/code/c/endianness_file.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* project: ntsdb
* file: endianness_file.c
* chapter: basic
* section: primitive
* description: show endianess when reading/writing string/int to file
* compile: gcc endianess_file.c
* run: ./a.out
*/

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

// write integer using the in memory layout directly, i.e. little-endian
void write_file(const char* file_name, int a) {
FILE *f = fopen(file_name, "w");
size_t wrote = fwrite(&a, 1, sizeof(a), f);
assert(wrote == 4);
fclose(f);
}

// read raw bytes as integer, only works if it is using same layout as memory
int read_int(const char* file_name) {
FILE *f = fopen(file_name, "r");
int val = -1;
size_t read = fread(&val, 1, sizeof(val), f);
assert(read == 4);
fclose(f);
return val;
}

int main() {
const char* file_name = "f1.bin";
int val = 256 + 2;
// 256 + 2 is 0b1_00000010
write_file(file_name, val);
printf("wrote %d got %d\n", val, read_int(file_name));
return 0;
}
9 changes: 9 additions & 0 deletions book/02-basic/primitive/code/go/convert_integer_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
package primitive

import "testing"

func TestConvertInteger(t *testing.T) {
var c int8 = -128
var uc = uint8(c)
t.Logf("%d %d", c, uc)
}
61 changes: 61 additions & 0 deletions book/02-basic/primitive/endianness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Endianness

## TODO

- [ ] go example?
- [ ] mention `hexdump` `xxd`

## Overview

- `byte` i.e. 8 bits is the atomic unit, endianness is for byte ordering. (for the scope in this book)
- `big-endian`, save MSB (Most Significant Byte) in the smallest address
- `little-endian`, save LSB (Least Significant Byte) in the smallest address
- most processors are `little-endian`
- network order is `big-endian`
- numeric literal, left is MSB, i.e. `big-endian` if memory address starts from left
- bit shift, shift left shift towards MSB, e.g. `0x0400 << 1` is `0x0800`

Example [code/c/endianness.c](code/c/endianness.c)

`2^10+1` when written in literal is `100_0000_0001` or `0x0401` (hex), it requires two bytes

```text
memory address: 0 8
big-endian : 0000_0100 0000_0001
little-endian : 0000_0001 0000_0100
```

## File

- [ ] text format have BOM

File can use either ordering as long as you can read what you write.
Normally file content is in a well known serialized format, e.g. text format like plain text, json or binary format like ELF.
Serialized content is a byte array and the layout on disk is same as the memory.
Endianess is handled by the encoder and decoder when converting types to and from bytes.

We can write integer to file directly as bytes, this is a straight forward (and unsafe) serialization.
For languages like C, we can cast int to char array directly ([example endianness_file.c](code/c/endianness_file.c)), the endianness is same as in processor layout.
For languages that is more type safe, (~~we can use unsafe~~) we can shift the bits to generate a byte array from integer.

```bash
xxd f1.bin
00000000: 0201 0000
```

## Tool

You can use `hexdump` or `xxd` to inspect binary files.

```bash
xxd <file>
# output in hex format
address: word1 word2

xxd -b <file>
# output in binary format
```

## Reference

- https://en.wikipedia.org/wiki/Endianness
Loading