Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Benchmark db lookups #8

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
04ff34e
cleanup
schpet Sep 1, 2024
28f26bd
schpet Sep 1, 2024
27e40fc
aider: build: Add condition to deploy only on 'main' branch
schpet Sep 1, 2024
a6127a0
schpet Sep 1, 2024
cef6e4e
schpet Sep 1, 2024
458da38
aider: feat: Add benchmark for comparing cool_id and bigint primary keys
schpet Sep 8, 2024
7c4536c
feat: Add database connection details to benchmark script
schpet Sep 8, 2024
4af00c5
aider: feat: Update benchmark to use 10,000 rows instead of 1 million
schpet Sep 8, 2024
f046981
aider: refactor: ensure existing data is removed before running bench…
schpet Sep 8, 2024
ff27fe1
aider: feat: generate sample data in batches of 1000 using insert_all
schpet Sep 8, 2024
c4828a3
aider: style: format code
schpet Sep 8, 2024
682e376
aider: fix: Generate cool_id for CoolIdUser records before inserting
schpet Sep 8, 2024
d2f56ab
aider: feat: add progress messages for batch inserts
schpet Sep 8, 2024
ac24b67
aider: feat: Add UUID primary key benchmark
schpet Sep 8, 2024
cba94cf
aider: feat: Reorder benchmark tests
schpet Sep 8, 2024
cb65926
aider: chore: add VACUUM operation after inserting data
schpet Sep 8, 2024
eae5b01
aider: feat: Run benchmarks twice for consistency
schpet Sep 8, 2024
c0e7d9f
aider: feat: Use bulk insert operations to speed up data generation f…
schpet Sep 8, 2024
e65139b
schpet Sep 8, 2024
df8a11c
aider: feat: add benchmark GitHub workflow and update database connec…
schpet Sep 8, 2024
cdbe8c8
Merge branch 'main' into benchmarks
schpet Sep 8, 2024
2a681e6
aider: chore: update benchmarks to use a constant for batch size
schpet Sep 8, 2024
4eefd26
aider: feat: increase batch size to 5000
schpet Sep 8, 2024
69b41bd
aider: feat: Add command-line argument for sample data size in cool_i…
schpet Sep 8, 2024
fd3c9c3
aider: feat: update github workflow to benchmark 1000000 records
schpet Sep 8, 2024
73f3a92
fix: Remove unnecessary console output from benchmark script
schpet Sep 8, 2024
ac61ac7
aider: refactor: Move ID lookups outside of measured performance
schpet Sep 8, 2024
6717109
aider: fix: Handle division by zero error in benchmark
schpet Sep 8, 2024
e631217
aider: fix: Handle division by zero error in benchmark script
schpet Sep 8, 2024
5c42a72
aider: feat: add cleanup for uuid tables
schpet Sep 8, 2024
ce3a7bc
aider: fix: Ensure iterations is not larger than sample size and at l…
schpet Sep 8, 2024
a43af8b
fix: Decrease BATCH_SIZE in cool_id_benchmark.rb
schpet Sep 8, 2024
f50a27b
aider: feat: Improve performance of generate_sample_data function
schpet Sep 8, 2024
99e24c8
aider: feat: Add option to skip existence check when generating cool_id
schpet Sep 8, 2024
4e13c78
aider: style: Fix code formatting
schpet Sep 8, 2024
28b2679
aider: feat: Add skip_existence_check option when generating sample data
schpet Sep 8, 2024
43b0e6c
aider: refactor: Load cool_id relatively
schpet Sep 8, 2024
2dd6c4a
schpet Sep 8, 2024
693af17
schpet Sep 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Benchmark

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:

jobs:
benchmark:
runs-on: ubuntu-latest

services:
postgres:
image: postgres:13
env:
POSTGRES_PASSWORD: postgres
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

steps:
- uses: actions/checkout@v3
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 3.1.0
bundler-cache: true
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y postgresql-client
- name: Set up database
run: |
createdb -h localhost -U postgres cool_id_benchmark
env:
PGPASSWORD: postgres
- name: Run benchmark
run: |
bundle exec ruby benchmark/cool_id_benchmark.rb 1000000
env:
PGHOST: localhost
PGUSER: postgres
PGPASSWORD: postgres
3 changes: 3 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,7 @@ group :development, :test do
gem "standard", "~> 1.3"
gem "yard", "~> 0.9.28"
gem "sqlite3", "~> 1.4"
gem "pg", "~> 1.2"
gem "faker", "~> 2.18"
gem "benchmark", "~> 0.2.0"
end
7 changes: 7 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,14 @@ GEM
tzinfo (~> 2.0, >= 2.0.5)
ast (2.4.2)
base64 (0.2.0)
benchmark (0.2.1)
bigdecimal (3.1.8)
concurrent-ruby (1.3.4)
connection_pool (2.4.1)
diff-lcs (1.5.1)
drb (2.2.1)
faker (2.23.0)
i18n (>= 1.8.11, < 2)
i18n (1.14.5)
concurrent-ruby (~> 1.0)
json (2.7.2)
Expand All @@ -46,6 +49,7 @@ GEM
parser (3.3.4.2)
ast (~> 2.4.1)
racc
pg (1.5.8)
racc (1.8.1)
rainbow (3.1.1)
rake (13.2.1)
Expand Down Expand Up @@ -111,7 +115,10 @@ PLATFORMS
ruby

DEPENDENCIES
benchmark (~> 0.2.0)
cool_id!
faker (~> 2.18)
pg (~> 1.2)
rake (~> 13.0)
rspec (~> 3.0)
sqlite3 (~> 1.4)
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ gem for rails apps to generate string ids with a prefix, followed by a [nanoid](

### basic id generation

## usage

### basic id generation

```ruby
class User < ActiveRecord::Base
include CoolId::Model
Expand Down
215 changes: 215 additions & 0 deletions benchmark/cool_id_benchmark.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# frozen_string_literal: true

require "benchmark"
require "active_record"
require_relative "../lib/cool_id"
require "faker"

# Configure ActiveRecord to use PostgreSQL
ActiveRecord::Base.establish_connection(
adapter: "postgresql",
host: ENV["PGHOST"] || "localhost",
username: ENV["PGUSER"] || "postgres",
password: ENV["PGPASSWORD"] || "postgres",
database: "cool_id_benchmark"
)

# Models for cool_id primary key
class CoolIdUser < ActiveRecord::Base
include CoolId::Model
cool_id prefix: "usr"
has_one :cool_id_profile
end

class CoolIdProfile < ActiveRecord::Base
belongs_to :cool_id_user
end

# Models for bigint primary key with cool_id as public_id
class BigIntUser < ActiveRecord::Base
include CoolId::Model
cool_id prefix: "usr", id_field: :public_id
has_one :big_int_profile
end

class BigIntProfile < ActiveRecord::Base
belongs_to :big_int_user
end

# Models for UUID primary key
class UuidUser < ActiveRecord::Base
has_one :uuid_profile
end

class UuidProfile < ActiveRecord::Base
belongs_to :uuid_user
end

# Set up database schema
ActiveRecord::Schema.define do
create_table :cool_id_users, id: :string, force: true do |t|
t.string :name
end

create_table :cool_id_profiles, force: true do |t|
t.string :cool_id_user_id
t.string :bio
end

create_table :big_int_users, force: true do |t|
t.string :public_id
t.string :name
end

create_table :big_int_profiles, force: true do |t|
t.bigint :big_int_user_id
t.string :bio
end

create_table :uuid_users, id: :uuid, default: -> { "gen_random_uuid()" }, force: true do |t|
t.string :name
end

create_table :uuid_profiles, force: true do |t|
t.uuid :uuid_user_id
t.string :bio
end
end

BATCH_SIZE = 1000

# Generate sample data
def generate_sample_data(count)
total_batches = count / BATCH_SIZE
ActiveRecord::Base.transaction do
(count / BATCH_SIZE).times do |batch|
puts "Preparing batch #{batch + 1} of #{total_batches}..."

cool_id_users = BATCH_SIZE.times.map { {id: CoolIdUser.generate_cool_id(skip_existence_check: true), name: Faker::Name.name} }
big_int_users = BATCH_SIZE.times.map { {name: Faker::Name.name, public_id: BigIntUser.generate_cool_id(skip_existence_check: true)} }
uuid_users = BATCH_SIZE.times.map { {name: Faker::Name.name} }

cool_id_user_ids = CoolIdUser.insert_all!(cool_id_users).rows.flatten
big_int_user_ids = BigIntUser.insert_all!(big_int_users).rows.flatten
uuid_user_ids = UuidUser.insert_all!(uuid_users).rows.flatten

cool_id_profiles = cool_id_user_ids.map { |id| {cool_id_user_id: id, bio: Faker::Lorem.paragraph} }
big_int_profiles = big_int_user_ids.map { |id| {big_int_user_id: id, bio: Faker::Lorem.paragraph} }
uuid_profiles = uuid_user_ids.map { |id| {uuid_user_id: id, bio: Faker::Lorem.paragraph} }

CoolIdProfile.insert_all!(cool_id_profiles)
BigIntProfile.insert_all!(big_int_profiles)
UuidProfile.insert_all!(uuid_profiles)
end
end
end

# Prepare sample IDs for benchmarking
def prepare_sample_ids(count)
{
big_int: BigIntUser.pluck(:id).sample(count),
uuid: UuidUser.pluck(:id).sample(count),
cool_id: CoolIdUser.pluck(:id).sample(count)
}.transform_values { |ids| ids.compact }
end

# Benchmark queries
def run_benchmark(iterations, sample_ids)
Benchmark.bm(20) do |x|
[:big_int, :uuid, :cool_id].each do |id_type|
x.report("#{id_type.to_s.capitalize} Query:") do
if sample_ids[id_type].empty?
puts "No sample IDs for #{id_type}. Skipping benchmark."
else
iterations.times do |i|
case id_type
when :big_int
BigIntUser.joins(:big_int_profile).where(id: sample_ids[id_type].sample).first
when :uuid
UuidUser.joins(:uuid_profile).where(id: sample_ids[id_type].sample).first
when :cool_id
CoolIdUser.joins(:cool_id_profile).where(id: sample_ids[id_type].sample).first
end
end
end
end
end
end
end

# Clean up existing data
def clean_up_data
ActiveRecord::Base.connection.drop_table :cool_id_users, if_exists: true
ActiveRecord::Base.connection.drop_table :cool_id_profiles, if_exists: true
ActiveRecord::Base.connection.drop_table :big_int_users, if_exists: true
ActiveRecord::Base.connection.drop_table :big_int_profiles, if_exists: true
ActiveRecord::Base.connection.drop_table :uuid_users, if_exists: true
ActiveRecord::Base.connection.drop_table :uuid_profiles, if_exists: true
end

# Parse command-line arguments for sample data size and iterations
sample_size = ARGV[0] ? ARGV[0].to_i : 10_000
iterations = ARGV[1] ? ARGV[1].to_i : [10_000, sample_size].min

# Ensure iterations is not larger than sample size and at least 1
iterations = iterations.clamp(1, sample_size)

# Main execution
clean_up_data

puts "Setting up schema..."
ActiveRecord::Schema.define do
create_table :cool_id_users, id: :string, force: true do |t|
t.string :name
t.index :id, unique: true
end

create_table :cool_id_profiles, force: true do |t|
t.string :cool_id_user_id
t.string :bio
t.index :cool_id_user_id
end

create_table :big_int_users, force: true do |t|
t.string :public_id
t.string :name
t.index :public_id, unique: true
end

create_table :big_int_profiles, force: true do |t|
t.bigint :big_int_user_id
t.string :bio
t.index :big_int_user_id
end

create_table :uuid_users, id: :uuid, default: -> { "gen_random_uuid()" }, force: true do |t|
t.string :name
t.index :id, unique: true
end

create_table :uuid_profiles, force: true do |t|
t.uuid :uuid_user_id
t.string :bio
t.index :uuid_user_id
end
end

puts "Generating sample data (#{sample_size} records)..."
generate_sample_data(sample_size)

puts "Running VACUUM..."
ActiveRecord::Base.connection.execute("VACUUM ANALYZE")

puts "Preparing sample IDs for benchmarks..."
sample_ids = prepare_sample_ids(10_000)

puts "Running benchmarks..."
run_benchmark(iterations, sample_ids)


sample_ids = prepare_sample_ids(10_000)
puts "Running benchmarks again..."
run_benchmark(iterations, sample_ids)

# Clean up
clean_up_data
11 changes: 7 additions & 4 deletions lib/cool_id.rb
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,10 @@ def registry

# Generates a unique ID based on the given configuration.
# @param config [Config] The configuration for ID generation.
# @param skip_existence_check [Boolean] Whether to skip the existence check (default: false).
# @return [String] A unique ID.
# @raise [MaxRetriesExceededError] If unable to generate a unique ID within the maximum number of retries.
def generate_id(config)
def generate_id(config, skip_existence_check: false)
alphabet = config.alphabet || @alphabet
length = config.length || @length
max_retries = config.max_retries || @max_retries
Expand All @@ -80,7 +81,8 @@ def generate_id(config)
loop do
nano_id = Nanoid.generate(size: length, alphabet: alphabet)
full_id = "#{config.prefix}#{separator}#{nano_id}"
if !config.model_class.exists?(id: full_id)

if skip_existence_check || !config.model_class.exists?(id: full_id)
return full_id
end

Expand Down Expand Up @@ -226,9 +228,10 @@ def cool_id(options)
end

# Generates a new CoolId for this model.
# @param skip_existence_check [Boolean] Whether to skip the existence check (default: false).
# @return [String] A new CoolId.
def generate_cool_id
CoolId.generate_id(@cool_id_config)
def generate_cool_id(skip_existence_check: false)
CoolId.generate_id(@cool_id_config, skip_existence_check: skip_existence_check)
end

# Enforces CoolId setup for all descendants of this model.
Expand Down
7 changes: 7 additions & 0 deletions spec/cool_id_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,13 @@ class Customer < ActiveRecord::Base
expect(user.id).to match(/^usr_[0-9a-z]{12}$/)
end

it "generates a cool_id without existence check" do
allow(User).to receive(:exists?).and_return(true) # Simulate all IDs exist
id = User.generate_cool_id(skip_existence_check: true)
expect(id).to match(/^usr_[0-9a-z]{12}$/)
expect(User).not_to have_received(:exists?)
end

it "does not overwrite an existing id" do
user = User.create(id: "custom-id", name: "Jane Doe")
expect(user.id).to eq("custom-id")
Expand Down
Loading