Data Services

A high-performance, distributed data access layer implementing request coalescing and hash-based routing to reduce database load and prevent hot partitions.

Overview

Data Services is a middleware layer that sits between API servers and Cassandra clusters, providing request coalescing. It's designed to handle high-traffic scenarios efficiently by reducing duplicate database queries and preventing database overload. It is inspired by Discord's architecture explained in their blog post: HOW DISCORD STORES TRILLIONS OF MESSAGES.

An example of usecase from Discord is when a big announcement is sent on a large server (Discord group) that notifies @everyone: users are going to open the app and read the same message, sending tons of traffic to the database. This is where request coalescing comes in handy, as it can combine all the requests for the same data into a single database query, reducing the load on the database and preventing hot partitions.

A simpler way of understanding it is: caching with the duration equal to the time spent running the query. No client has to be aware of the coalescing because the max amount of staleness is the same as if each client had run the query themselves. It also doesn't require extra memory, because the query result falls out of scope as soon as it is sent to all waiters.

Key Features

Request Coalescing: Automatically combines duplicate requests for the same data into a single database query
Consistent Hash-based Routing: Routes related requests to the same service instance for optimal coalescing
Distributed Architecture: Multiple service instances working in parallel
High Availability: Data service nodes are stateless and can be scaled horizontally
Monitoring: Built-in metrics for tracking requests and queries counts

Setup

$ docker-compose up --build

Wait for the services to start up. The Cassandra cluster will be initialized with the required keyspace. You will see something like this in the logs that shows that the data service instances are ready to accept requests:

data-service1-1   | 2024/10/26 16:18:45 Connected to cassandra
data-service1-1   | 2024/10/26 16:18:45 Starting server on port 50051
data-service2-1   | 2024/10/26 16:18:45 Connected to cassandra
data-service2-1   | 2024/10/26 16:18:45 Starting server on port 50052

Run the client CLI to send test requests to the data service:

$ go run ./client -h
  -channels int
        Number of unique channels to distribute requests across (number of unique requests) (default 20)
  -requests int
        Total number of requests to send (default 10000)

Example usage:

$ go run ./client
2024/10/26 17:08:11 Unique requests: 20, Total requests: 10000, Total queries executed: 184
2024/10/26 17:08:11 Average queries per request: 0.0184
2024/10/26 17:08:11 Saved queries by coalescing: 9816
2024/10/26 17:08:11 Total time taken: 816.727364ms

Architecture

Components

Data Service Nodes: gRPC servers that handle incoming requests and manage database connections
Cassandra Cluster: A 3-node Cassandra cluster for data storage
Client: gRPC Test client for simulating high-traffic scenarios

%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'nodeTextSize': '16px', 'labelTextSize': '16px', 'titleTextSize': '20px' }}}%%

flowchart TB
    subgraph Users ["Multiple Users"]
        U1[User 1]
        U2[User 2]
        U3[User 3]
    end

    subgraph API ["Hash-based Routing API"]
        A1[API Server 1]
        A2[API Server 2]
    end

    subgraph DS ["Data Services Layer"]
        direction TB
        subgraph DS1 ["Data Service Instance 1"]
            direction TB
            subgraph Coalescing1 ["Request Coalescing"]
                R1["Request (Channel 1)"]
                R2["Request (Channel 1)"]
                R3["Request (Channel 1)"]
                RC1[Request Coalescer]
                DQ1["Single DB Query"]
                R1 & R2 & R3 --> RC1
                RC1 --> DQ1
            end
        end
        subgraph DS2 ["Data Service Instance 2"]
            direction TB
            subgraph Coalescing2 ["Request Coalescing"]
                R4["Request (Channel 2)"]
                R5["Request (Channel 2)"]
                R6["Request (Channel 2)"]
                RC2[Request Coalescer]
                DQ2["Single DB Query"]
                R4 & R5 & R6 --> RC2
                RC2 --> DQ2
            end
        end
    end

    subgraph DB ["Cassandra Cluster"]
        C1[Node 1] <--> C2[Node 2] <--> C3[Node 3] <--> C1
    end

    %% Connect users to API
    U1 --> A1
    U2 --> API
    U3 --> A2

    %% Connect API to Data Services
    A1 --> DS1
    A1 --> DS2
    A2 --> DS1
    A2 --> DS2

    %% Connect Data Services to Cassandra
    DS1 --> DB
    DS2 --> DB

    classDef users fill:#B3E5FC,stroke:#0277BD,color:#000000,font-size:16px
    classDef api fill:#FFB74D,stroke:#E65100,color:#000000,font-size:16px
    classDef dataservice fill:#CE93D8,stroke:#6A1B9A,color:#000000,font-size:16px
    classDef cassandra fill:#81C784,stroke:#2E7D32,color:#000000,font-size:16px
    classDef component fill:#E0E0E0,stroke:#424242,color:#000000,font-size:16px

    class U1,U2,U3 users
    class A1,A2 api
    class DS1,DS2,R1,R2,R3,R4,R5,R6,RC1,RC2,DQ1,DQ2 dataservice
    class C1,C2,C3 cassandra

Technical Learnings

Go Concurrency Patterns

Channels
- Used for async communication between goroutines
- Each request is in its own goroutine with a channel for response from the query executer goroutine
Mutex Operations
- Implemented thread-safe access to shared resources
Atomic Operations
- Used lock-free atomic counters for metrics tracking
WaitGroups
- Used for waiting on multiple goroutines to complete in the CLI client
Context Management
- Used context for request timeouts and cancellation

gRPC Implementation

Defined service interfaces using Protocol Buffers
Managed timeout handling using context

Docker and Container Orchestration

Implemented health checks for service readiness
Managed container dependencies and startup order
Configured networking between services
Implemented volume management for data persistence

Future Improvements

Monitoring & Observability
- Add distributed tracing
Scalability
- Implement dynamic service discovery (e.g. Consul or etcd)
Resilience
- Add circuit breakers
- Implement retry policies
- Add rate limiting

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
client		client
dataservice		dataservice
messages		messages
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
init.cql		init.cql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Services

Overview

Key Features

Setup

Architecture

Components

Technical Learnings

Go Concurrency Patterns

gRPC Implementation

Docker and Container Orchestration

Future Improvements

License

About

Releases

Packages

Languages

License

amrdb/data-services

Folders and files

Latest commit

History

Repository files navigation

Data Services

Overview

Key Features

Setup

Architecture

Components

Technical Learnings

Go Concurrency Patterns

gRPC Implementation

Docker and Container Orchestration

Future Improvements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages