Skip to content

Commit

Permalink
CSV benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Mican authored and Daniel Mican committed Dec 1, 2023
1 parent e023648 commit 31e4459
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 3 deletions.
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ Hardware:
|--------------------|-------------------|------------|-------------------|
| Simple Aggregation | 36,000 msgs / sec | 256 MiB | 102 MiB |
| Enrichment | 13,000 msgs /sec | 368 MiB | 124 MiB |
| CSV Disk | | | |
| CSV Memory | | | |
| CSV Disk Join | 11,500 msgs /sec | 312 MiB | 152 MiB |
| CSV Memory Join | 33,200 msgs / sec | 300 MiB | 107 MiB |

### Simple Aggregate

Expand All @@ -158,5 +158,19 @@ python3 cmd/publish-test-data.py --num-messages=1000000 --topic="topic-enrich"
/usr/bin/time -l python3 cmd/sql-flow.py run /Users/danielmican/code/github.com/turbolytics/sql-flow/dev/config/benchmarks/enrich.yml
```

### CSV Disk Join

```
python3 cmd/publish-test-data.py --num-messages=1000000 --topic="topic-csv-filesystem-join"
SQLFLOW_STATIC_ROOT=/Users/danielmican/code/github.com/turbolytics/sql-flow/dev /usr/bin/time -l python3 cmd/sql-flow.py run /Users/danielmican/code/github.com/turbolytics/sql-flow/dev/config/examples/csv.filesystem.join.yml
```

## CSV Memory Join

```
SQLFLOW_STATIC_ROOT=/Users/danielmican/code/github.com/turbolytics/sql-flow/dev /usr/bin/time -l python3 cmd/sql-flow.py run /Users/danielmican/code/github.com/turbolytics/sql-flow/dev/config/examples/csv.mem.join.yml
python3 cmd/publish-test-data.py --num-messages=1000000 --topic="topic-csv-mem-join"
```

---
Like SQLFlow? Use SQLFlow? Feature Requests? Please let us know! [email protected]
10 changes: 9 additions & 1 deletion cmd/publish-test-data.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,14 @@
"originalTimestamp": "2015-12-12T19:11:01.152Z"
}

cities = [
'San Fransisco',
'Baltimore',
'New York',
'Miami',
'Asheville',
]


@click.command()
@click.option('--num-messages', default=1001, type=int)
Expand All @@ -45,7 +53,7 @@ def main(num_messages, topic):
producer = Producer(conf)
for i in range(num_messages):
e = copy.deepcopy(event)
e['properties']['city'] = e['properties']['city'] + str(random.randrange(0, 1000))
e['properties']['city'] = random.choice(cities)
j_event = json.dumps(e)
producer.produce(topic, value=j_event)
if i % 1000 == 0:
Expand Down

0 comments on commit 31e4459

Please sign in to comment.