Skip to content

WIP: Benchmarks

turbolytics edited this page Jan 6, 2025 · 4 revisions

This section aims to provide estimations on the type of performance to expect from sqlflow from various different use cases.

Overview

Name Throughput Max RSS Memory Peak Memory Usage
Simple Aggregation Memory 45,000 msgs / sec 230 MiB 130 MiB
Simple Aggregation Disk 36,000 msgs / sec 256 MiB 102 MiB
Enrichment 13,000 msgs /sec 368 MiB 124 MiB
CSV Disk Join 11,500 msgs /sec 312 MiB 152 MiB
CSV Memory Join 33,200 msgs / sec 300 MiB 107 MiB
In Memory Tumbling Window 44,000 msgs / sec 198 MiB 96 MiB

Methodology

Each test loads 1MM records into kafka. Each test executes sql-flow consumer until each message is processed. Each test captures the maximum resident memory during the benchmark, and the average throughput of message ingestion.

System

Hardware:
    Hardware Overview:
      Model Name: MacBook Pro
      Model Identifier: MacBookPro18,3
      Model Number: Z15G001X2LL/A
      Chip: Apple M1 Pro
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB
      Activation Lock Status: Enabled

Scenarios

Simple Aggregate - Disk / Mem

Performs a simple aggregate. Output is significantly smaller than input.

./benchmark/simple-agg-disk.sh
./benchmark/simple-agg-mem.sh

Enrichment

Performs an enrichment. Output is 1:1 records with input, but each output record is enhanced with additional information.

./benchmark/enrich.sh

CSV Disk Join

./benchmark/csv.filesystem.join.yml

CSV Memory Join

./benchmark/csv.mem.join.yml

In Memory Tumbling Window

Tumbling window that aggregates count of cities.

./benchmark/tumbling-window.sh