-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics: do not depend on table information when calculating the table size #56036
statistics: do not depend on table information when calculating the table size #56036
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #56036 +/- ##
=================================================
- Coverage 72.9454% 57.0992% -15.8462%
=================================================
Files 1604 1761 +157
Lines 446749 635891 +189142
=================================================
+ Hits 325883 363089 +37206
- Misses 100805 248139 +147334
- Partials 20061 24663 +4602
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Tested locally:
#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;
#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
#[clap(short, long, help = "MySQL connection string")]
database_url: String,
}
#[derive(Debug)]
struct TableRow {
id: i64,
column1: String,
column2: i32,
column3: i32,
column4: String,
}
#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
let args = Args::parse();
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&args.database_url)
.await?;
// Create 20 tables
for i in 0..20 {
let table_name = format!("t{}", i);
let create_table_query = format!(
"CREATE TABLE IF NOT EXISTS {} (
id BIGINT NOT NULL PRIMARY KEY,
column1 VARCHAR(255) NOT NULL,
column2 INT NOT NULL,
column3 INT NOT NULL,
column4 VARCHAR(255) NOT NULL,
INDEX idx_column1 (column1)
)",
table_name
);
sqlx::query(&create_table_query)
.execute(&pool)
.await?;
println!("Created table: {}", table_name);
// Insert 3000 rows into each table
for _ in 0..3000 {
let row = TableRow {
id: Faker.fake::<i64>(),
column1: Faker.fake::<String>(),
column2: Faker.fake::<i32>(),
column3: Faker.fake::<i32>(),
column4: Faker.fake::<String>(),
};
let insert_query = format!(
"INSERT INTO {} (id, column1, column2, column3, column4)
VALUES (?, ?, ?, ?, ?)",
table_name
);
sqlx::query(&insert_query)
.bind(row.id)
.bind(&row.column1)
.bind(row.column2)
.bind(row.column3)
.bind(&row.column4)
.execute(&pool)
.await?;
}
println!("Successfully inserted 3000 rows into table '{}'.", table_name);
}
Ok(())
}
As you can see the table size is |
Signed-off-by: Rustin170506 <[email protected]>
For partitioned tables:
#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;
#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
#[clap(short, long, help = "MySQL connection string")]
database_url: String,
}
#[derive(Debug)]
struct TableRow {
id: i64,
partition_key: u32,
column1: String,
column2: i32,
column3: i32,
column4: String,
}
#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
let args = Args::parse();
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&args.database_url)
.await?;
// Create partitioned table if not exists
sqlx::query(
"CREATE TABLE IF NOT EXISTS t (
id BIGINT NOT NULL,
partition_key INT NOT NULL,
column1 VARCHAR(255) NOT NULL,
column2 INT NOT NULL,
column3 INT NOT NULL,
column4 VARCHAR(255) NOT NULL,
PRIMARY KEY (id, partition_key),
index idx_column1 (column1)
) PARTITION BY RANGE (partition_key) (
PARTITION p0 VALUES LESS THAN (3000),
PARTITION p1 VALUES LESS THAN (6000),
PARTITION p2 VALUES LESS THAN (9000),
PARTITION p3 VALUES LESS THAN (12000),
PARTITION p4 VALUES LESS THAN (15000),
PARTITION p5 VALUES LESS THAN (18000),
PARTITION p6 VALUES LESS THAN (21000),
PARTITION p7 VALUES LESS THAN (24000),
PARTITION p8 VALUES LESS THAN (27000),
PARTITION p9 VALUES LESS THAN (30000),
PARTITION p10 VALUES LESS THAN (33000),
PARTITION p11 VALUES LESS THAN (36000),
PARTITION p12 VALUES LESS THAN (39000),
PARTITION p13 VALUES LESS THAN (42000),
PARTITION p14 VALUES LESS THAN (45000),
PARTITION p15 VALUES LESS THAN (48000),
PARTITION p16 VALUES LESS THAN (51000),
PARTITION p17 VALUES LESS THAN (54000),
PARTITION p18 VALUES LESS THAN (57000),
PARTITION p19 VALUES LESS THAN (60000),
PARTITION p20 VALUES LESS THAN (63000)
)"
)
.execute(&pool)
.await?;
// Insert 3000 rows into each of the 20 partitions
for partition in 1..=20 {
let partition_key = partition * 3000 + 1; // This ensures each partition key is unique
for _ in 0..3000 {
let row = TableRow {
id: Faker.fake::<i64>(), // Generate a unique id
partition_key, // Use the current partition key
column1: Faker.fake::<String>(),
column2: Faker.fake::<i32>(),
column3: Faker.fake::<i32>(),
column4: Faker.fake::<String>(),
};
sqlx::query(
"INSERT INTO t (id, partition_key, column1, column2, column3, column4)
VALUES (?, ?, ?, ?, ?, ?)"
)
.bind(row.id)
.bind(row.partition_key)
.bind(&row.column1)
.bind(row.column2)
.bind(row.column3)
.bind(&row.column4)
.execute(&pool)
.await?;
}
println!("Successfully inserted 3000 rows into partition {} of the 't' table.", partition);
}
Ok(())
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔢 Self-check (PR reviewed by myself and ready for feedback.)
/retest |
1 similar comment
/retest |
Signed-off-by: Rustin170506 <[email protected]>
Tested agian:
#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;
#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
#[clap(short, long, help = "MySQL connection string")]
database_url: String,
}
#[derive(Debug)]
struct TableRow {
id: i64,
partition_key: u32,
column1: String,
column2: i32,
column3: i32,
column4: String,
}
#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
let args = Args::parse();
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&args.database_url)
.await?;
// Create partitioned table if not exists
sqlx::query(
"CREATE TABLE IF NOT EXISTS t (
id BIGINT NOT NULL,
partition_key INT NOT NULL,
column1 VARCHAR(255) NOT NULL,
column2 INT NOT NULL,
column3 INT NOT NULL,
column4 VARCHAR(255) NOT NULL,
PRIMARY KEY (id, partition_key),
index idx_column1 (column1)
) PARTITION BY RANGE (partition_key) (
PARTITION p0 VALUES LESS THAN (3000),
PARTITION p1 VALUES LESS THAN (6000),
PARTITION p2 VALUES LESS THAN (9000),
PARTITION p3 VALUES LESS THAN (12000),
PARTITION p4 VALUES LESS THAN (15000),
PARTITION p5 VALUES LESS THAN (18000),
PARTITION p6 VALUES LESS THAN (21000),
PARTITION p7 VALUES LESS THAN (24000),
PARTITION p8 VALUES LESS THAN (27000),
PARTITION p9 VALUES LESS THAN (30000),
PARTITION p10 VALUES LESS THAN (33000),
PARTITION p11 VALUES LESS THAN (36000),
PARTITION p12 VALUES LESS THAN (39000),
PARTITION p13 VALUES LESS THAN (42000),
PARTITION p14 VALUES LESS THAN (45000),
PARTITION p15 VALUES LESS THAN (48000),
PARTITION p16 VALUES LESS THAN (51000),
PARTITION p17 VALUES LESS THAN (54000),
PARTITION p18 VALUES LESS THAN (57000),
PARTITION p19 VALUES LESS THAN (60000),
PARTITION p20 VALUES LESS THAN (63000)
)"
)
.execute(&pool)
.await?;
// Insert 3000 rows into each of the 20 partitions
for partition in 1..=20 {
let partition_key = partition * 3000 + 1; // This ensures each partition key is unique
for _ in 0..3000 {
let row = TableRow {
id: Faker.fake::<i64>(), // Generate a unique id
partition_key, // Use the current partition key
column1: Faker.fake::<String>(),
column2: Faker.fake::<i32>(),
column3: Faker.fake::<i32>(),
column4: Faker.fake::<String>(),
};
sqlx::query(
"INSERT INTO t (id, partition_key, column1, column2, column3, column4)
VALUES (?, ?, ?, ?, ?, ?)"
)
.bind(row.id)
.bind(row.partition_key)
.bind(&row.column1)
.bind(row.column2)
.bind(row.column3)
.bind(&row.column4)
.execute(&pool)
.await?;
}
println!("Successfully inserted 3000 rows into partition {} of the 't' table.", partition);
}
Ok(())
}
#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;
#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
#[clap(short, long, help = "MySQL connection string")]
database_url: String,
}
#[derive(Debug)]
struct TableRow {
id: i64,
column1: String,
column2: i32,
column3: i32,
column4: String,
}
#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
let args = Args::parse();
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&args.database_url)
.await?;
// Update the first table and insert new data
update_first_table(&pool).await?;
Ok(())
}
async fn update_first_table(pool: &sqlx::MySqlPool) -> Result<(), sqlx::Error> {
let table_name = "t0";
// Add a new column to the first table
let alter_table_query = format!(
"ALTER TABLE {} ADD COLUMN new_column VARCHAR(255)",
table_name
);
sqlx::query(&alter_table_query).execute(pool).await?;
println!("Added new_column to table {}", table_name);
// Insert 5000 rows with the new column
for _ in 0..5000 {
let row = TableRow {
id: Faker.fake::<i64>(),
column1: Faker.fake::<String>(),
column2: Faker.fake::<i32>(),
column3: Faker.fake::<i32>(),
column4: Faker.fake::<String>(),
};
let new_column_value: String = Faker.fake();
let insert_query = format!(
"INSERT INTO {} (id, column1, column2, column3, column4, new_column)
VALUES (?, ?, ?, ?, ?, ?)",
table_name
);
sqlx::query(&insert_query)
.bind(row.id)
.bind(&row.column1)
.bind(row.column2)
.bind(row.column3)
.bind(&row.column4)
.bind(&new_column_value)
.execute(pool)
.await?;
}
println!("Successfully inserted 5000 rows with new_column into table '{}'.", table_name);
Ok(())
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/retest |
Signed-off-by: Rustin170506 <[email protected]>
@Rustin170506: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elsa0520, winoros The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: ref #55906
Problem Summary:
What changed and how does it work?
In this pull request, I attempted to utilize table statistics to obtain the column number instead of relying on the table information schema. This approach would eliminate the need to retrieve the table information schema when updating the analysis job based on the new table row count.
See: https://github.com/pingcap/tidb/pull/55889/files#r1756288302
We only need ColNum here because every time we create a table or a new column, we will also create a histogram record for it. After that, we will load it into memory(If it has udpate). So, usually, it is the same as the column number from the table information schema.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.