Skip to content

Commit

Permalink
Vald architecture document (#366)
Browse files Browse the repository at this point in the history
* add architecture overview draft

* fix

* update architecture doc

* fix

* update Vald Meta section

* fix Vald backup

* fix grammar

* fix arch doc: vald agent section

* fix

* fix Vald filter section

* draft data flow section

* edit search flow

* add vector data space explaination

* Apply suggestions from code review

Co-authored-by: Kiichiro YUKAWA <[email protected]>

* move assets to assets/doc folder

* Apply suggestions from code review

Co-authored-by: Kiichiro YUKAWA <[email protected]>

* update vector data space explaination diagram

* separate insert and search flow diagram

* fix image

* fix image

* update vector data space diagram( Tree -> Index

* change docs and images path

* Update doc path

* 🤖 Update license headers and formatting go codes

Signed-off-by: vdaas-ci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Kiichiro YUKAWA <[email protected]>
Co-authored-by: Rintaro Okamura <[email protected]>

* Apply suggestions from code review

Co-authored-by: Rintaro Okamura <[email protected]>

* Revert "🤖 Update license headers and formatting go codes"

This reverts commit 0db7224.

* Apply suggestions from code review

Co-authored-by: Yusuke Kato <[email protected]>

* explain more detail in insert flow

* fix

* fix

* update searching flow

* Apply suggestions from code review

Co-authored-by: Kiichiro YUKAWA <[email protected]>

* Revert ":robot: Update license headers and formatting go codes"

This reverts commit 07ac8d9.

* Update docs/overview/architecture.md

* 🤖 Automatically add contributor

Signed-off-by: vdaas-ci <[email protected]>

Co-authored-by: Kiichiro YUKAWA <[email protected]>
Co-authored-by: vdaas-ci <[email protected]>
Co-authored-by: Rintaro Okamura <[email protected]>
Co-authored-by: Yusuke Kato <[email protected]>
  • Loading branch information
5 people authored May 28, 2020
1 parent bd64452 commit 12b26e2
Show file tree
Hide file tree
Showing 9 changed files with 261 additions and 0 deletions.
1 change: 1 addition & 0 deletions CONTRIBUTORS
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
hlts2
vankichi
kevindiu
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ Write example here
<img src="./assets/image/svg/Vald Architecture Overview.svg" width="100%">
</div>

Please refer [here](./docs/overview/architecture.md) for more details of the architecture overview in the future.

## Development

Before your first commit to this repository, it is strongly recommended to run the commands below.
Expand All @@ -125,6 +127,7 @@ Please read the [contribution guide](https://github.com/vdaas/vald/blob/master/C

- [hlts2](https://github.com/hlts2)
- [vankichi](https://github.com/vankichi)
- [kevindiu](https://github.com/kevindiu)

## LICENSE

Expand Down
1 change: 1 addition & 0 deletions assets/docs/insert_flow.drawio

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions assets/docs/insert_flow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions assets/docs/search_flow.drawio

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions assets/docs/search_flow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions assets/docs/vector_data_space_explain.drawio

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions assets/docs/vector_data_space_explain.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
245 changes: 245 additions & 0 deletions docs/overview/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# Vald Architecture <!-- omit in toc -->
This document describes the high-level architecture design of Vald and explains each component in Vald.
## Table of Contents <!-- omit in toc -->

- [Overview](#overview)
- [Data Flow](#data-flow)
- [Insert](#insert)
- [Search](#search)
- [Components](#components)
- [Vald Filter](#vald-filter)
- [Vald Ingress Filter](#vald-ingress-filter)
- [Vald Egress Filter](#vald-egress-filter)
- [Vald Filter Gateway](#vald-filter-gateway)
- [Vald Metadata](#vald-metadata)
- [Vald Meta Gateway](#vald-meta-gateway)
- [Vald Meta](#vald-meta)
- [Vald Backup](#vald-backup)
- [Vald Compressor](#vald-compressor)
- [Vald Backup Manager](#vald-backup-manager)
- [Vald Backup Gateway](#vald-backup-gateway)
- [Vald Load Balancing](#vald-load-balancing)
- [Vald LB Gateway](#vald-lb-gateway)
- [Agent Discoverer](#agent-discoverer)
- [Vald Core Engine](#vald-core-engine)
- [Vald Agent](#vald-agent)
- [Vald Agent Scheduler](#vald-agent-scheduler)
- [Vald Index Manager](#vald-index-manager)
- [Vald Replication Manager](#vald-replication-manager)
- [Vald Replication Manager Agent](#vald-replication-manager-agent)
- [Vald Replication Manager Controller](#vald-replication-manager-controller)
- [Kubernetes Components](#kubernetes-components)
- [Kube-apiserver](#kube-apiserver)
- [Custom Resources](#custom-resources)

## Overview



Vald uses a cloud-native architecture focusing on [Kubernetes](https://kubernetes.io/).
Some components in Vald use Kubernetes API to control the behavior of distributed vector indexes.
Before reading this document, you need to have some understanding of the basic idea of cloud-native architecture and Kubernetes.

The below image is Vald's architecture.

<img src="../../design/Vald Future Architecture Overview.svg" />

We will explain this image in the following section.

## Data Flow

This section describes the data flow inside Vald and how Vald's vector indexes are stored.
This is the most important part for the users to understand Vald.

### Insert

<img src="../../assets/docs/insert_flow.svg" />

When the user inserts data into Vald:

1. Vald Ingress receives the request from the user. The request includes the vector and the vector ID.
2. Vald Ingress will forward the request to the Vald Filter Gateway to pre-process the request data.
3. Vald Filter Gateway will forward the request to the user-defined Vald Ingress Filter. After the Vald Ingress Filter received the request, it will perform the pre-processing logic defined by the user, for example, padding the vector to match the vector dimension in Vald.
4. After the request is processed by the user-defined Vald Ingress Filter, the result will return to the Vald Filter Gateway.
5. Vald Filter Gateway will forward the processed data to the Vald Meta Gateway. Vald Meta Gateway will generate the UUID for each vector for internal use and the UUID will be mapped to the vector ID from the user's request. The reason of using UUID instead of vector ID is because the vector ID may be too long and it may increase the memory usage in Vald Agent.
6. Vald Meta Gateway will forward the request with the UUID to the Vald Backup Gateway, which will process the backup logic in 14-16 to prevent the data lost in Vald.
7. Vald Backup Gateway will forward the request to Vald LB Gateway. Vald LB Gateway will determine which Vald Agent(s) to process the request based on the resource usage of the nodes and pods, and the number of vector replicas.
8. Vald LB Gateway will forward the UUID and the vector data to the selected Vald Agents in parallel. Vald Agent will insert the vector and UUID in an on-memory vector queue. A vector queue will be committed to an ANN graph index by a `CreateIndex` instruction executed by the Vald Index Manager.
9. If Vald Agent successfully inserts the request data, it will return success to the Vald LB Gateway.
10. After Vald LB Gateway receives success from the selected Vald Agents, it will respond the IP addresses of all selected Vald Agents to the Vald Backup Gateway.
11. Vald Backup Gateway returns success to Vald Meta Gateway.
12. Vald Meta Gateway will forward the UUID(s) and vector ID(s) to the Vald Meta.
13. Vald Meta will store the UUID(s) and vector ID(s) that were successfully processed by the Vald Agent(s) to the persistent layer such as Redis, Cassandra, MySQL, etc.
14. Vald Backup Gateway will asynchronously send all the inserted the data (vector(s), vector ID(s), UUID(s) and IP address(es)) to the Vald Compressor. Vald Compressor will compress the vector data asynchronously to reduce the size of the vector data.
15. Vald Compressor will forward the data (compressed vector(s), vector ID(s), UUID(s) and IP address(es)) to the Vald Backup Manager.
16. Vald Backup Manager will store all of the data to the persistent layer such as MySQL, Cassandra, etc., to prevent the data lost in Vald.
17. Vald Meta Gateway will return success to the Vald Filter Gateway.
18. Vald Filter Gateway will return success to the Vald Ingress.

### Search

<img src="../../assets/docs/search_flow.svg" />

When the user searches a vector from Vald:

1. Vald Ingress receives a search request from the user. Vald provides 2 searching interfaces to the user, the user can search by vector or search by the vector ID.
2. Vald Ingress will forward the request to the Vald Filter Gateway to pre-process the request data.
3. Vald Filter Gateway will forward the request to the user-defined Vald Ingress Filter. After the Vald Ingress Filter received the request, it will perform the pre-processing logic defined by the user, for example, padding the vector to match the vector dimension in Vald.
4. After the request is processed by the user-defined Vald Ingress Filter, the result will return to the Vald Filter Gateway.
5. Vald Filter Gateway will forward the request to the Vald Meta Gateway. Vald Meta Gateway is used to resolve the internal used UUID to the user inserted vector ID in step 10-11.
6. Vald Meta Gateway will forward the request to the Vald LB Gateway. Vald LB Gateway will preform the post-processing of the result in step 9 after the Vald Agent(s) return in step 8.
7. Vald LB Gateway will forward the request to all Vald Agents in parallel. Each Vald Agent will search the _k_ nearest neighbor vectors in an on memory graph index.
8. Vald Agent returns the searching result to the Vald LB Gateway. The searching result includes the UUID, the vector distance, and the vector. The number of the result will be the same as requested.
9. Vald LB Gateway will aggregate all searching results from all Vald Agents, rank the result by the vector distance, and return the ranked result to the Vald Meta Gateway.
10. Vald Meta Gateway will forward the searching result to the Vald Meta to resolve the user-defined vector IDs from the internal used UUIDs.
11. Vald Meta will perform a search for the Vector IDs based on the internal used UUIDs.
12. Vald Meta will return the Vector IDs to the Vald Meta Gateway.
13. Vald Meta Gateway will combine the vectors and the vector IDs from the searching result and return to the Vald Filter Gateway.
14. Vald Filter Gateway will forward the request to the user-defined Vald Egress Filter to filter the final result. For example exclude the specific type of the result from the vector ID.
15. Vald Egress Filter will return the filtered result to the Vald Filter Gateway.
16. Vald Filter Gateway will return the final result to the Vald Ingress.

<!-- ### Update -->

<!-- ### Delete -->

## Components

### Vald Filter

Vald Filter is an optional functionality in Vald.
User can implement the custom filtering logic and integrate with Vald.

Vald Filter provides the following functionalities.

- Custom filter based on request query
- Custom filter for the searching result

#### Vald Ingress Filter

Vald Ingress Filter filters the incoming request before processing it.

Users can implement custom filtering logic such as changing the vectors or filtering based on user ID.

#### Vald Egress Filter

Vald Egress Filter filters the response before sending it to the user.

This component can reorder the searching result from multiple Vald Agents based on the user-defined ranking.

#### Vald Filter Gateway

Vald Filter Gateway forwards the request to Vald Ingress Filter before processing it and forwards the response to the Vald Egress Filter before returning the searching result to the user.

### Vald Metadata

In Vald, metadata consists of the vector data and the corresponding additional data to represent the set of the searching criteria and the result.

Vald Metadata includes the user inputted metadata(vector ID) and the vector, and the internal generated UUID.

#### Vald Meta Gateway

The main responsibility of the Vald Meta Gateway is to process the Vald metadata and to forward the information to Vald Backup Gateway.

It will perform the following action:

1. Return error if the user has already input the same vector in Vald
1. Generate the corresponding UUID for internal use.
1. Forward the vector ID and UUID request to the Vald Meta.
1. Forward the vector information (vector ID, vector, and UUID) to Vald Backup Gateway.

#### Vald Meta

Vald Meta is the agent to process the CRUD request of the metadata (vector ID and UUID).
Users can configure which data source to be used in Vald Meta (for example Redis or Cassandra).

### Vald Backup

To support auto-healing functionality and increase performance during disaster recovery, Vald implements the backup mechanism.

#### Vald Compressor

Vald Compressor compresses the vector data and sends to the Vald Backup Manager to process the backup request.

#### Vald Backup Manager

Vald Backup Manager processes the Create/Read/Delete request of the backup request and handles the compressed metadata. Users can configure which data source to be used in Vald Meta (for example Redis or Cassandra).

#### Vald Backup Gateway

Vald Backup Gateway will forward the backup request to the Vald LB Gateway.
It also forwards to Vald Compressor asynchronously with metadata.

### Vald Load Balancing

Load balancing is one of the important concepts in distributed computing, which means it distributes a set of tasks over a set of resources aiming for making the overall processing more efficient.
Vald implements its own load balancing controller.
Vald can load balance the request base on node resources.

#### Vald LB Gateway

Vald LB Gateway loads balance the user request base on the node resources results from the Agent Discoverer.

#### Agent Discoverer

Agent Discoverer discovers active Vald pods and the corresponding node's resources usage via [kube-apiserver](https://github.com/kubernetes/kubernetes/tree/master/cmd/kube-apiserver).

### Vald Core Engine

In this section, we will describe what is Vald Agent and the corresponding components to support Vald Agent.

#### Vald Agent

Vald Agent provides functionalities to perform approximate nearest neighbor search.
Agent-NGT uses [yahoojapan/NGT](https://github.com/yahoojapan/NGT) as a core library.

Each Vald Agent pod has its own vector data space because only several Vald Agents are selected to be inserted/updated in a single insert/update request.

When searching a vector in Vald, each Vald Agent return different results of _k_-nearest neighbors depending on their index, and you'll get the merged result of them.
<img src="../../assets/docs/vector_data_space_explain.svg" />

#### Vald Agent Scheduler

Vald Agent Scheduler is the scheduler of the Vald Agent.
It implements it's own custom scheduling logic to increase the scalability of the Vald Agent.

It schedules Vald Agent base on the Node CPU and memory usage, and the amount of the indexes.

#### Vald Index Manager

Vald Index Manager controls the timing of the indexing inserted vectors on the Vald Agent.
The index is used to increase the performance of the search action.

It retrieves the active Vald Agent pods from the Vald Discoverer and triggers the indexing action on each Vald Agent.

### Vald Replication Manager

Vald Replication Manager manages the healthiness of the Vald Agent.
When the pod is dead, Vald Replication Manager will recover the cache automatically to keeps the reliability of the service.

#### Vald Replication Manager Agent

Vald Replication Manager Agent recovers the specific backup cache to the specific Vald Agent.
It retrieves the target backup from the Vald Compressor and recovers it to the newly created Vald Agent.

#### Vald Replication Manager Controller

Vald Replication Manager Controller keeps track of the active Vald Agent pods.
When the Vald Agent is dead, it will trigger the Vald Replication Manager Agent to recover the backup cache to the auto-healed pods from the backup.

### Kubernetes Components

Vald is base on the Kubernetes platform.
In this section we will explain the Kubernetes component used in Vald and why we need them.

#### Kube-apiserver

Kube-apiserver is a component of Kubernetes.
The main responsibility of Kube-apiserver in Vald is to provide node resource information for Vald agent scalability.

For more information about Kube-apiserver, please refer to [the official document](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/).

#### Custom Resources

Custom Resources in Vald is a [Custom Resouce Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) implementation.
It provides flexibility for users to control the Vald deployment such as pod startup sequence, etc.

0 comments on commit 12b26e2

Please sign in to comment.