Indexify is a complex system that has many different pieces. To help both users and developers of Indexify build a mental model of how it works, this page documents the system architecture.
This page covers the technical details of Indexify. You do not need to understand these details to use Indexify effectively. The details are documented here for those who wish to learn about them without having to go spelunking through the source code.The architecture of Indexify is shaped with the following requirements -
- Support calling Graph endpoints millions of times and continuously running extraction and indexing pipelines in real time.
- Horizontally scale with more ingested data so applications don't need to switch to a more serious framework as they become mature.
- Time to run functions be at most a few milliseconds so that we can support LLM applications requiring business decisions related to the real world where things change frequently.
- Run functions across various hardware platforms - GPUs, TPUs, and CPUs. Run more compute-intensive parts of a pipeline that use deep learning models on GPUs, while less compute-intensive processes on CPU.
- Support auto-scaling so that compute nodes are released when there is no data to process.
- Support spot instances - if work is not finished and extractors are removed, it's restarted elsewhere on the cluster.
- Support multi-cloud and federated deployments of data planes. This allows running a pipeline across data centers with more GPU capacity while data and applications reside on another cloud.
- Local first experience to make it easy to prototype and test applications. Let's face it: every journey starts with a developer's laptop.
Indexify is composed of two main components -
- Server: Hosts the remote Graph endpoints and a task scheduler for scheduling function execution.
- Extractors: Runs Functions of the Graphs.
Indexify Server is composed of two primary services. They are run together when Indexify runs in the development mode. High-scale production use cases can be deployed separately to scale them out horizontally and for high availability.
It exposes RPC APIs for applications to call Graphs and retreive function outputs. It also manages function outputs on Blob Stores.
It has a blazing-fast task scheduler that creates 1000s of tasks every second when data dependencies of functions are met in a Graph. It evaluates the extraction policies and allocates tasks to extractors. The server doesn't have any external dependencies and uses a state machine under the hood on RocksDB. This design allowed us to build an entirely reactive scheduler to evaluate 1000s of data dependencies and schedule tasks.
Executors are the data plane where functions are run. They can run on any hardware, and a single Indexify deployment can support 10s of 1000s of executors in a single cluster.
They communicate with the server over SSEs and HTTP APIs. When they start up, they register information about the hardware and runtime, and send heartbeats periodically. When the Server has some tasks that require running on an executor, it sends the Tasks on the heartbeat stream. The executor downloads the function inputs from the storage system and then runs the function. After tasks are completed, any output is uploaded back to the server, along with the outcome over the heartbeat stream.