You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently added a new command vexplain trace to Vitess. This command runs a query under tracing mode and provides detailed information about operator calls and data flow. We want to leverage this to create a benchmark tool that can measure the impact of planner changes.
Objective
Create a benchmark tool to measure the impact of planner changes by comparing the vexplain trace output before and after the change.
Advantages Over Traditional Time-Based Benchmarking
This approach offers several significant advantages over traditional time-based benchmarking:
Consistency: Unlike time-based benchmarks that can vary due to system load, network conditions, or other external factors, our metrics are deterministic and directly reflect the query plan's efficiency.
No Special Hardware Required: Traditional benchmarks often require dedicated, bare-metal hardware to ensure consistent results. Our tool can run on any system, making it more accessible for developers and CI/CD pipelines.
Eliminate Warm-up Time: Time-based benchmarks typically require warm-up runs to reach a steady state. Our approach provides meaningful results from a single run, saving time and resources.
Direct Insight into Planner Changes: By focusing on metrics like route calls, data sent, and memory usage, we get a clear picture of how planner changes affect query execution, rather than just seeing an overall time difference.
Proxy for Performance: While we ultimately care about query execution time, these metrics serve as reliable proxies that are much easier to measure precisely and consistently.
Easier Comparison: With standardized metrics, it's easier to compare results across different environments or Vitess versions, providing a more robust basis for decision-making.
Faster Iteration: The ability to quickly and reliably measure the impact of planner changes allows for faster development cycles and more confident optimizations.
This tool will make it significantly easier to evaluate and compare the effects of planner changes, providing a valuable complement to our existing performance measurement techniques.
Key Metrics to Measure
Data Sent from VTTablets
Measure the Route operator
Track the amount of data sent to VTGate
Count the number of calls to routes
Route Efficiency
Consider the work required by different route types (Scatter, Equal, EqualUnique)
Compare the efficiency of routes returning similar amounts of data
Memory Consumption
Measure memory usage of different operators
Focus on memory-intensive operations like memory sort and hash join
Implementation Details
The benchmark tool will be implemented as an extension to the existing vitess-tester tooling.
Usage
# Install Vitess version X
vitess-tester --trace myQueries-X.trace t/myQueries.test -vschema t/vschema.json
# Install Vitess version Y
vitess-tester --trace myQueries-Y.trace t/myQueries.test -vschema t/vschema.json
# Compare traces
vitess-tester --trace-compare myQueries-X.trace myQueries-Y.trace
Test File Structure
The test files will follow this general structure:
CREATE TABLE statements to set up the schema
INSERT statements to populate the data
SELECT statements to run the queries
No additional directives are needed in the test files.
Mock Output Example
Trace Comparison: myQueries-X.trace vs myQueries-Y.trace
Query 1: SELECT * FROM customers WHERE id = 1
-------------------------------------------------------------------------
| | Version X | Version Y | Diff | % Change |
-------------------------------------------------------------------------
| Route Calls | 1 | 1 | 0 | 0% |
| Data Sent (rows) | 1 | 1 | 0 | 0% |
| Route Type | Equal | Equal | - | - |
| Memory Usage (MB) | 0.01 | 0.01 | 0 | 0% |
-------------------------------------------------------------------------
Query 2: SELECT o.id, c.name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'shipped'
-------------------------------------------------------------------------
| | Version X | Version Y | Diff | % Change |
-------------------------------------------------------------------------
| Route Calls | 3 | 2 | -1 | -33.33% |
| Data Sent (rows) | 150 | 100 | -50 | -33.33% |
| Route Type | Scatter | Equal | - | - |
| Memory Usage (MB) | 5 | 2 | -3 | -60% |
-------------------------------------------------------------------------
Summary:
- 1 out of 2 queries showed significant change
- Average change in Route Calls: -16.67%
- Average change in Data Sent: -16.67%
- Average change in Memory Usage: -30%
The text was updated successfully, but these errors were encountered:
Background
We recently added a new command
vexplain trace
to Vitess. This command runs a query under tracing mode and provides detailed information about operator calls and data flow. We want to leverage this to create a benchmark tool that can measure the impact of planner changes.Objective
Create a benchmark tool to measure the impact of planner changes by comparing the
vexplain trace
output before and after the change.Advantages Over Traditional Time-Based Benchmarking
This approach offers several significant advantages over traditional time-based benchmarking:
Consistency: Unlike time-based benchmarks that can vary due to system load, network conditions, or other external factors, our metrics are deterministic and directly reflect the query plan's efficiency.
No Special Hardware Required: Traditional benchmarks often require dedicated, bare-metal hardware to ensure consistent results. Our tool can run on any system, making it more accessible for developers and CI/CD pipelines.
Eliminate Warm-up Time: Time-based benchmarks typically require warm-up runs to reach a steady state. Our approach provides meaningful results from a single run, saving time and resources.
Direct Insight into Planner Changes: By focusing on metrics like route calls, data sent, and memory usage, we get a clear picture of how planner changes affect query execution, rather than just seeing an overall time difference.
Proxy for Performance: While we ultimately care about query execution time, these metrics serve as reliable proxies that are much easier to measure precisely and consistently.
Easier Comparison: With standardized metrics, it's easier to compare results across different environments or Vitess versions, providing a more robust basis for decision-making.
Faster Iteration: The ability to quickly and reliably measure the impact of planner changes allows for faster development cycles and more confident optimizations.
This tool will make it significantly easier to evaluate and compare the effects of planner changes, providing a valuable complement to our existing performance measurement techniques.
Key Metrics to Measure
Data Sent from VTTablets
Route Efficiency
Memory Consumption
Implementation Details
The benchmark tool will be implemented as an extension to the existing vitess-tester tooling.
Usage
Test File Structure
The test files will follow this general structure:
No additional directives are needed in the test files.
Mock Output Example
The text was updated successfully, but these errors were encountered: