xephonhq · at15 · Jul 18, 2020 · May 13, 2020 · May 13, 2020 · May 21, 2020
diff --git a/.gitignore b/.gitignore
@@ -1,16 +1,8 @@
-# Node rules:
-## Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
-.grunt
+# IDE Start
+.idea
+.vscode
+# IDE End
 
-## Dependency directory
-## Commenting this out is preferred by some people, see
-## https://docs.npmjs.com/misc/faq#should-i-check-my-node_modules-folder-into-git
-node_modules
-
-# Book build output
-_book
-
-# eBook build output
-*.epub
-*.mobi
-*.pdf
+# Example Start
+*.out
+# Example End
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2017 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,45 +1,21 @@
 # Notes on Time Series Database
 
-**Deprecated**, please see related section
-
-This a notes about the history and implementation of [various time series databases](https://github.com/xephonhq/awesome-time-series-database).
-It aims to provide a deep insight of how various TSDBs work and how they evolve into current stage.
-Also some related fields like mining time series data would be covered.
-
-There are two versions:
-  - a [Gitbook version](https://at15.gitbooks.io/notes-on-tsdb/content/)
-  - a [Tex version](https://github.com/xephonhq/notes-on-tsdb/blob/master/tex).
-
-For comparing TSDBs, you may want to try [Xephon-B](https://github.com/xephonhq/xephon-b) (still work in progress).
-
-## Roadmap
-
-The writing roadmap is based on my personal interest and need for course projects,
-so it may not be in a well organized order.
-
-- [ ] Basic knowledge of databases
-- [ ] In memory time series databases
-  - [ ] compress data (delta)
-  - [ ] concurrency in single machine
-  - [ ] handling meta data
-  - [ ] log
-  - [ ] multiple machine
-- [ ] Query Language
-- [ ] Benchmark
-  - [ ] work load generation
-  - [ ] existing tools and pitfalls
-  - [ ] design of [Xephon-B](https://github.com/xephonhq/xephon-b)
-- [ ] Genetic and time series
-  - [ ] existing genetic databases
-  - [ ] store genetic data in time series databases
-  - [ ] mining genetic data (also see mining time series data)
-- [ ] Mining time series data
+This repo is a work in progress [book](book) about time series database (TSDB) that (will) contains:
+
+- [Survey](survey) on [various existing time series databases](https://github.com/xephonhq/awesome-time-series-database)
+- How to write a (distributed) time series database from scratch.
+- Related fields like distributed tracing, OLAP database.
+
+Project layout
+
+- [book](book) The WIP book
+- [doc](doc) [Roadmap](doc/ROADMAP.md) and notes on writing notes
+- [survey](survey) Survey on TSDB and related fields
 
 ## Related
 
-- [at15/papers-i-read](https://github.com/at15/papers-i-read)
-- [at15/code-i-read](https://github.com/at15/code-i-read)
-- [at15/pub](https://github.com/at15/pub)
+- [Awesome Time Series Database](https://github.com/xephonhq/awesome-time-series-database)
+- [libtsdb](https://github.com/libtsdb)
 
 ## Acknowledgment
 
@@ -48,4 +24,6 @@ so it may not be in a well organized order.
 
 ## License
 
+The book is licensed under create commons. The sample code (unless specified otherwise in comment) is licensed under MIT.
+
 [CC BY-NC-SA 3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/us/)
diff --git a/SUMMARY.md b/SUMMARY.md
diff --git a/book.json b/book.json
diff --git a/book/01-overview/README.md b/book/01-overview/README.md
@@ -0,0 +1,7 @@
+# Chapter 01: Overview
+
+## TODO
+
+- [ ] what is time series data
+- [ ] why do we need time series database
+- [ ] a glimpse on tsdb landscape (like DDA, I can draw the map by myself and put xephon-k in the far north beyond the wall)
diff --git a/book/02-basic/README.md b/book/02-basic/README.md
@@ -0,0 +1,9 @@
+# Chapter 02: Basic
+
+- [Primitive Data Type](primitive) Representation of primitive type like integer, float, string.
+  - [Endianness](primitive/endianess.md)
+  - [Integer](primitive/integer.md)
+
+## TODO
+
+- [ ] math
diff --git a/book/02-basic/primitive/README.md b/book/02-basic/primitive/README.md
@@ -0,0 +1,3 @@
+# Primitive Types
+
+- [Endianness](endianness.md)
diff --git a/book/02-basic/primitive/code/c/.gitignore b/book/02-basic/primitive/code/c/.gitignore
@@ -0,0 +1 @@
+f1.bin
diff --git a/book/02-basic/primitive/code/c/convert_integer.c b/book/02-basic/primitive/code/c/convert_integer.c
@@ -0,0 +1,20 @@
+/*
+ * project: ntsdb
+ * file: convert_integer.c
+ * chapter: basic
+ * section: primitive
+ * description: show behavior of converting signed and unsigned integer in c
+ * compile: gcc convert_interger.c
+ * run: ./a.out
+ */
+
+#include <stdio.h>
+
+int main() {
+    // 1000_000 interpreted as signed is -128, i.e. -2^7 + 0
+	char c = -128;
+	// 1000_000 interpreted as unsigned is 128, i.e. 2^7
+	unsigned char uc = (unsigned char) c;
+	printf("%d %u\n", c, uc); // -128, 128
+	return 0;
+}
diff --git a/book/02-basic/primitive/code/c/endianness.c b/book/02-basic/primitive/code/c/endianness.c
@@ -0,0 +1,33 @@
+/*
+* project: ntsdb
+* file: endianness.c
+* chapter: basic
+* section: primitive
+* description: show endianess of integer
+* compile: gcc endianess.c
+* run: ./a.out
+*/
+
+#include <stdio.h>
+
+// NOTE: based on CSAPP Figure 2.4
+
+void show_bytes(unsigned char* start, size_t len) {
+    for (int i = 0; i < len; i++) {
+        printf(" %.2x", start[i]);
+    }
+    printf("\n");
+}
+
+void show_int(int x) {
+    show_bytes((unsigned char*) &x, sizeof(x));
+}
+
+int main() {
+    // 01 04 00 00
+    // 1025 is 2^10+1, which is 0x0401, print starts from smaller memory address, 01 is LSB and prints before 04
+    show_int(1025);
+
+    // 1024 2048
+    printf("%d %d\n", 0x0400, 0x0400 << 1);
+}
diff --git a/book/02-basic/primitive/code/c/endianness_file.c b/book/02-basic/primitive/code/c/endianness_file.c
@@ -0,0 +1,40 @@
+/*
+* project: ntsdb
+* file: endianness_file.c
+* chapter: basic
+* section: primitive
+* description: show endianess when reading/writing string/int to file
+* compile: gcc endianess_file.c
+* run: ./a.out
+*/
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+// write integer using the in memory layout directly, i.e. little-endian
+void write_file(const char* file_name, int a) {
+    FILE *f = fopen(file_name, "w");
+    size_t wrote = fwrite(&a, 1, sizeof(a), f);
+    assert(wrote == 4);
+    fclose(f);
+}
+
+// read raw bytes as integer, only works if it is using same layout as memory
+int read_int(const char* file_name) {
+    FILE *f = fopen(file_name, "r");
+    int val = -1;
+    size_t read = fread(&val, 1, sizeof(val), f);
+    assert(read == 4);
+    fclose(f);
+    return val;
+}
+
+int main() {
+    const char* file_name = "f1.bin";
+    int val = 256 + 2;
+    // 256 + 2 is 0b1_00000010
+    write_file(file_name, val);
+    printf("wrote %d got %d\n", val, read_int(file_name));
+    return 0;
+}
diff --git a/book/02-basic/primitive/code/go/convert_integer_test.go b/book/02-basic/primitive/code/go/convert_integer_test.go
@@ -0,0 +1,9 @@
+package primitive
+
+import "testing"
+
+func TestConvertInteger(t *testing.T) {
+	var c int8 = -128
+	var uc = uint8(c)
+	t.Logf("%d %d", c, uc)
+}
diff --git a/book/02-basic/primitive/endianness.md b/book/02-basic/primitive/endianness.md
@@ -0,0 +1,61 @@
+# Endianness
+
+## TODO
+
+- [ ] go example?
+- [ ] mention `hexdump` `xxd`
+
+## Overview
+
+- `byte` i.e. 8 bits is the atomic unit, endianness is for byte ordering. (for the scope in this book)
+- `big-endian`, save MSB (Most Significant Byte) in the smallest address
+- `little-endian`, save LSB (Least Significant Byte) in the smallest address
+- most processors are `little-endian`
+- network order is `big-endian`
+- numeric literal, left is MSB, i.e. `big-endian` if memory address starts from left
+- bit shift, shift left shift towards MSB, e.g. `0x0400 << 1` is `0x0800`
+
+Example [code/c/endianness.c](code/c/endianness.c)
+
+`2^10+1` when written in literal is `100_0000_0001` or `0x0401` (hex), it requires two bytes
+
+```text
+memory address: 0          8
+big-endian    : 0000_0100  0000_0001
+little-endian : 0000_0001  0000_0100
+```
+
+## File
+
+- [ ] text format have BOM
+
+File can use either ordering as long as you can read what you write.
+Normally file content is in a well known serialized format, e.g. text format like plain text, json or binary format like ELF.
+Serialized content is a byte array and the layout on disk is same as the memory.
+Endianess is handled by the encoder and decoder when converting types to and from bytes.
+
+We can write integer to file directly as bytes, this is a straight forward (and unsafe) serialization.
+For languages like C, we can cast int to char array directly ([example endianness_file.c](code/c/endianness_file.c)), the endianness is same as in processor layout.
+For languages that is more type safe, (~~we can use unsafe~~) we can shift the bits to generate a byte array from integer.
+
+```bash
+xxd f1.bin
+00000000: 0201 0000
+```
+
+## Tool
+
+You can use `hexdump` or `xxd` to inspect binary files.
+
+```bash
+xxd <file>
+# output in hex format
+address: word1 word2
+
+xxd -b <file>
+# output in binary format
+```
+
+## Reference
+
+- https://en.wikipedia.org/wiki/Endianness