Skip to content

Commit

Permalink
[BACKPORT 2024.1][#23521] YSQL: Cost YB Bitmap Table Scan remote filters
Browse files Browse the repository at this point in the history
Summary:
Original commit: 4ea354b / D36484
Prior to this diff, Bitmap Scan CBO did not handle remote filter costs. Evaluating a remote filter was given no cost, so it was effectively "free". Therefore AND conditions were seen as better to use remote filters for, rather than Bitmap And, because Bitmap And nodes add cost but remote filters added no cost.

To fix this, I
1. determine which quals will be evaluated by the index (as an index cond or a remote filter) in `yb_get_bitmap_index_quals`. This means that the previous change to `create_bitmap_subplan` that collected pushdown expressions can be reverted. The new function has a simpler interface, and allows for better comparing the given qual against the quals evaluated by the plan tree - now some redundant filters can be skipped. (more on this below)
2. determine which quals remain to be addressed by the Bitmap Table Scan node.
3. Account for the cost of evaluating those filters in the same manner that yb_cost_seqscan does, using `cost_qual_eval`. This allows for Bitmap And plans to be a reasonable alternative to remote filter plans.

**Moving the discourage modifier:** I also changed the DISCOURAGE_MODIFIER to affect the startup and runtime costs of Bitmap Table Scan instead of the index scans. Now we no longer incentivise Bitmap Scan plans with more work done in the Bitmap Table Scan node.

**Removing redundant tests: ** Removed some unstable tests from the yb_bitmap_scan_flags regress tests. They failed with the addition of this change, but actually don't provide anything meaningful. They answer the question: "what will the planner choose when both options are disabled?" which depends on exact details of how each plan is costed. That does not belong in a regress test.

**Removing redundant remote filters**: the planner identifies which nodes are resposible for evaluating the given conditions. The Bitmap Table scan node should evaluate anything that's not already implied by the conditions on the Bitmap Index Scans. However, with the addition of a storage index filters on Bitmap Index Scans, this logic was slightly faulty. Two lists of clauses were created, one for index conditions and one for index remote filters. Consider the conditons: `a < 10 OR (b < 10 AND remote_fllter(b))`. This might create a plan like:
```
Bitmap Table Scan
   Storage Filter: xxx
   -> Bitmap Or
        -> Bitmap Index Scan
           Index Cond: a < 10
        -> Bitmap Index Scan
           Index Cond: b < 10
           Storage Index Filter: remote_flter(b))
```
The list of index conditions was: `OR(a < 10, b < 10)`. The list of index filters was: `remote_filter(b)`. Neither of these indiviudally implied the original condition, so the planner decided it was necessary to apply the remote filters.

With this change, a single unified list of index conditions and storage index filters is returned. In this case, the list is `OR(a < 10, AND(b < 10, remote_filter(b))`. Since the original condition is implied by (actually equivalent to) this expression, the planner can now easily determine that the original conditions are already met, so there is no need to add a storage filter.
Jira: DB-12438

Test Plan:
```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgCostModelSeekNextEstimation'
```

TAQO report: {F279017}

Bitmap scan is often the most optimal plan for these queries, but less frequently the default plan with this change because of the movement of the discourage factor and the addition of the remote filter cost. While these are regressions, the feature is not yet in EA and further CBO changes are incoming, including [[ #23565 | #23565 ]] to remove the discourage factor entirely. This is a helpful step in the right direction.

Reviewers: gkukreja, mtakahara

Reviewed By: mtakahara

Subscribers: yql, dsherstobitov

Differential Revision: https://phorge.dev.yugabyte.com/D37623
  • Loading branch information
timothy-e committed Aug 30, 2024
1 parent 1131977 commit 697ecb5
Show file tree
Hide file tree
Showing 14 changed files with 634 additions and 172 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

import static org.junit.Assert.assertTrue;

import java.math.BigDecimal;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.List;
import java.math.BigDecimal;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand All @@ -26,6 +26,7 @@ public class ExplainAnalyzeUtils {
public static final String NODE_AGGREGATE = "Aggregate";
public static final String NODE_BITMAP_INDEX_SCAN = "Bitmap Index Scan";
public static final String NODE_BITMAP_OR = "BitmapOr";
public static final String NODE_BITMAP_AND = "BitmapAnd";
public static final String NODE_FUNCTION_SCAN = "Function Scan";
public static final String NODE_GATHER = "Gather";
public static final String NODE_GATHER_MERGE = "Gather Merge";
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.yb.pgsql;

import static org.junit.Assume.assumeTrue;
import static org.yb.pgsql.ExplainAnalyzeUtils.NODE_BITMAP_AND;
import static org.yb.pgsql.ExplainAnalyzeUtils.NODE_BITMAP_INDEX_SCAN;
import static org.yb.pgsql.ExplainAnalyzeUtils.NODE_BITMAP_OR;
import static org.yb.pgsql.ExplainAnalyzeUtils.NODE_INDEX_SCAN;
Expand Down Expand Up @@ -40,6 +41,9 @@ public class TestPgCostModelSeekNextEstimation extends BasePgSQLTest {
private static final String T3_INDEX_NAME = T3_NAME + "_pkey";
private static final String T4_NAME = "t4";
private static final String T4_INDEX_NAME = T4_NAME + "_pkey";
private static final String T_NO_PKEY_NAME = "t_no_pkey";
private static final String T_K1_INDEX_NAME = "t_k1_idx";
private static final String T_K2_INDEX_NAME = "t_k2_idx";
private static final double SEEK_FAULT_TOLERANCE_OFFSET = 1;
private static final double SEEK_FAULT_TOLERANCE_RATE = 0.2;
private static final double SEEK_LOWER_BOUND_FACTOR = 1 - SEEK_FAULT_TOLERANCE_RATE;
Expand Down Expand Up @@ -613,6 +617,73 @@ public void testSeekNextEstimationBitmapScanWithOr() throws Exception {
}
}

// There's a middle ground for queries with AND. If few rows are to be
// selected, then it's cheap to apply remote filters. If a lot of rows are to
// be selected, then it's pointless to gather a large portion of the table for
// each Bitmap Index Scan under a BitmapAnd.
@Test
public void testSeekNextEstimationBitmapScanWithAnd() throws Exception {
try (Statement stmt = this.connection2.createStatement()) {
stmt.execute(String.format("CREATE TABLE %s (k1 INT, k2 INT)", T_NO_PKEY_NAME));
stmt.execute(String.format("CREATE INDEX %s ON %s (k1 ASC)",
T_K1_INDEX_NAME, T_NO_PKEY_NAME));
stmt.execute(String.format("CREATE INDEX %s ON %s (k2 ASC)",
T_K2_INDEX_NAME, T_NO_PKEY_NAME));
stmt.execute(String.format("INSERT INTO %s SELECT s1, s2 FROM generate_series(1, 100) s1, "
+ "generate_series(1, 100) s2", T_NO_PKEY_NAME));
stmt.execute(String.format("ANALYZE %s", T_NO_PKEY_NAME));

final String query = "/*+ BitmapScan(t) */ SELECT * FROM %s AS t WHERE %s";
testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 1 AND k2 <= 1"),
T_NO_PKEY_NAME, 100, 300, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 2 AND k2 <= 2"),
T_NO_PKEY_NAME, 200, 600, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 5 AND k2 <= 5"),
T_NO_PKEY_NAME, 25, 75, 10,
makePlanBuilder().nodeType(NODE_BITMAP_AND).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 10 AND k2 <= 10"),
T_NO_PKEY_NAME, 100, 300, 10,
makePlanBuilder().nodeType(NODE_BITMAP_AND).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 20 AND k2 <= 20"),
T_NO_PKEY_NAME, 400, 1200, 10,
makePlanBuilder().nodeType(NODE_BITMAP_AND).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 40 AND k2 <= 40"),
T_NO_PKEY_NAME, 4000, 14000, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 80 AND k2 <= 80"),
T_NO_PKEY_NAME, 8000, 24000, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).build());

// If the two sets of ybctids are not similar sizes, it doesn't make sense
// to collect both sets. Instead, we collect the smaller and filter out
// the larger.
testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 20 AND k2 <= 40"),
T_NO_PKEY_NAME, 2000, 6000, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).indexName(T_K1_INDEX_NAME).build());

testSeekAndNextEstimationBitmapScanHelper(stmt,
String.format(query, T_NO_PKEY_NAME, "k1 <= 20 AND k2 <= 10"),
T_NO_PKEY_NAME, 1000, 3000, 10,
makePlanBuilder().nodeType(NODE_BITMAP_INDEX_SCAN).indexName(T_K2_INDEX_NAME).build());
}
}

/* These scans exceed work_mem, so their YB Bitmap Table Scan just performs a
* sequential scan. The seeks and nexts of the Table Scan should be equivalent
* to the seeks and nexts of the Sequential Scan.
Expand Down
179 changes: 153 additions & 26 deletions src/postgres/src/backend/optimizer/path/costsize.c
Original file line number Diff line number Diff line change
Expand Up @@ -615,11 +615,7 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
* We don't bother to save indexStartupCost or indexCorrelation, because a
* bitmap scan doesn't care about either.
*/
if (IsYugaByteEnabled() && baserel->is_yb_relation)
// TODO(#20573): YB Bitmap cost
path->indextotalcost = indexTotalCost * YB_BITMAP_DISCOURAGE_MODIFIER;
else
path->indextotalcost = indexTotalCost;
path->indextotalcost = indexTotalCost;
path->indexselectivity = indexSelectivity;

/* all costs for touching index itself included here */
Expand Down Expand Up @@ -1107,7 +1103,9 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
run_cost += path->pathtarget->cost.per_tuple * path->rows;

path->startup_cost = startup_cost;
path->total_cost = startup_cost + run_cost;
path->total_cost = (IsYugaByteEnabled() && baserel->is_yb_relation)
? (startup_cost + run_cost) * YB_BITMAP_DISCOURAGE_MODIFIER
: startup_cost + run_cost;
}

/*
Expand Down Expand Up @@ -7205,9 +7203,8 @@ yb_cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
(index_ybctid_num_nexts * per_next_cost);

path->indextotalcost =
(yb_local_latency_cost + startup_cost + run_cost +
index_ybctid_transfer_cost + index_ybctid_paging_seek_next_costs) *
YB_BITMAP_DISCOURAGE_MODIFIER;
yb_local_latency_cost + startup_cost + run_cost +
index_ybctid_transfer_cost + index_ybctid_paging_seek_next_costs;
path->indexselectivity = index_selectivity;
path->ybctid_width = index_ybctid_width;

Expand Down Expand Up @@ -7402,6 +7399,85 @@ yb_cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
yb_parallel_cost((Path *) path);
}


/*
* yb_get_bitmap_index_quals
* Return the list of quals used by the Bitmap Index Scans. This includes the
* index conditions themselves, and any additional remote filters that are
* applied by each Bitmap Index Scan.
*
* 'bitmapqual' is a tree of IndexPaths, BitmapAndPaths, and BitmapOrPaths
* 'scan_clauses' is a list of all conditions given to the bitmap path
*/
List *
yb_get_bitmap_index_quals(PlannerInfo *root, Path *bitmapqual,
List *scan_clauses)
{
if (IsA(bitmapqual, BitmapAndPath))
{
BitmapAndPath *apath = (BitmapAndPath *) bitmapqual;
List *subindexquals = NIL;
ListCell *l;

foreach(l, apath->bitmapquals)
{
List *subindexqual = yb_get_bitmap_index_quals(
root, (Path *) lfirst(l), scan_clauses);

subindexquals = list_concat_unique(subindexquals, subindexqual);
}

return subindexquals;
}
else if (IsA(bitmapqual, BitmapOrPath))
{
BitmapOrPath *opath = (BitmapOrPath *) bitmapqual;
List *subindexquals = NIL;
ListCell *l;

foreach(l, opath->bitmapquals)
{
List *subindexqual = yb_get_bitmap_index_quals(
root, (Path *) lfirst(l), scan_clauses);

subindexquals = lappend(subindexquals,
make_ands_explicit(subindexqual));
}

if (list_length(subindexquals) <= 1)
return subindexquals;
return list_make1(make_orclause(subindexquals));
}
else if (IsA(bitmapqual, IndexPath))
{
IndexPath *ipath = (IndexPath *) bitmapqual;
IndexScan *iscan;
List *indexqual;

/* Use the regular indexscan plan build machinery... */
iscan = castNode(IndexScan,
create_indexscan_plan(root, ipath,
NIL, scan_clauses,
false /* indexonly */,
true /* bitmapindex */));

indexqual = get_actual_clauses(ipath->indexquals);

if (iscan->yb_idx_pushdown.quals)
indexqual = lappend(indexqual,
make_ands_explicit(iscan->yb_idx_pushdown.quals));

pfree(iscan);

return indexqual;
}
else
{
elog(ERROR, "unrecognized node type: %d", nodeTag(bitmapqual));
}
}


/*
* yb_cost_bitmap_table_scan
* Determines and returns the cost of scanning a relation using a YB bitmap
Expand Down Expand Up @@ -7438,11 +7514,15 @@ yb_cost_bitmap_table_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
Cost per_seek_cost = 0.0;
Cost per_next_cost = 0.0;
List *local_clauses = NIL;
double remote_filtered_rows;
List *non_index_clauses = NIL;
int num_nexts;
int num_seeks;
int docdb_result_width;
double tuples_fetched;
double tuples_scanned;
QualCost qual_cost;
List *indexquals;
ListCell *l;

/* Should only be applied to Yugabyte base relations */
Assert(IsA(baserel, RelOptInfo));
Expand Down Expand Up @@ -7485,24 +7565,65 @@ yb_cost_bitmap_table_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
return;
}

/* TODO: account for local and remote quals */
tuples_fetched =
clamp_row_est(baserel->tuples *
clauselist_selectivity(root, baserel->baserestrictinfo,
baserel->relid, JOIN_INNER, NULL));
indexquals = yb_get_bitmap_index_quals(root, bitmapqual,
baserel->baserestrictinfo);

/* Determine what clauses remain to be checked by this node */
foreach(l, baserel->baserestrictinfo)
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
Node *clause = (Node *) rinfo->clause;

if (rinfo->pseudoconstant)
continue; /* we may drop pseudoconstants here */
if (list_member(indexquals, clause))
continue; /* simple duplicate */
if (!contain_mutable_functions(clause) &&
predicate_implied_by(list_make1(clause), indexquals, false))
continue; /* provably implied by indexquals */
non_index_clauses = lappend(non_index_clauses, rinfo);
}

/* Determine remote and local quals */
List *local_quals = NIL;
List *rel_remote_quals = NIL;
List *rel_colrefs = NIL;

extract_pushdown_clauses(non_index_clauses, NULL /* index_info */,
false /* bitmapindex */, &local_quals,
&rel_remote_quals, &rel_colrefs,
NULL /* idx_remote_quals */,
NULL /* idx_colrefs */);

tuples_scanned = clamp_row_est(baserel->tuples *
clauselist_selectivity(root, indexquals, baserel->relid,
JOIN_INNER, NULL));

cost_qual_eval(&qual_cost, rel_remote_quals, root);
startup_cost += qual_cost.startup;
run_cost +=
(qual_cost.per_tuple + list_length(rel_remote_quals) *
yb_docdb_remote_filter_overhead_cycles *
cpu_operator_cost) *
tuples_scanned;

tuples_fetched = clamp_row_est(baserel->tuples *
clauselist_selectivity(root,
list_union(indexquals, rel_remote_quals),
baserel->relid, JOIN_INNER, NULL));

/* Block fetch cost from disk */
num_blocks = ceil(tuples_fetched * tuple_width / YB_DEFAULT_DOCDB_BLOCK_SIZE);
num_blocks = ceil(tuples_scanned * tuple_width / YB_DEFAULT_DOCDB_BLOCK_SIZE);
run_cost += yb_random_block_cost * num_blocks;

/* DocDB costs for merging key-value pairs to form tuples */
per_merge_cost = num_key_value_pairs_per_tuple *
yb_docdb_merge_cpu_cycles * cpu_operator_cost;

/* Seek to first key cost */
if (tuples_fetched > 0)
if (tuples_scanned > 0)
{
per_seek_cost = yb_get_lsm_seek_cost(tuples_fetched,
per_seek_cost = yb_get_lsm_seek_cost(tuples_scanned,
num_key_value_pairs_per_tuple,
num_sst_files) +
per_merge_cost;
Expand All @@ -7512,9 +7633,6 @@ yb_cost_bitmap_table_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
per_next_cost = (yb_docdb_next_cpu_cycles * cpu_operator_cost) +
per_merge_cost;

/* TODO: account for table pushdown quals */
remote_filtered_rows = tuples_fetched;

/* tlist eval costs are paid per output row, not per tuple scanned */
startup_cost += path->pathtarget->cost.startup;
run_cost += path->pathtarget->cost.per_tuple * path->rows;
Expand All @@ -7528,8 +7646,8 @@ yb_cost_bitmap_table_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
baserel, reloid);
path->yb_plan_info.estimated_docdb_result_width = docdb_result_width;

num_seeks = tuples_fetched;
num_nexts = (max_nexts_to_avoid_seek + 1) * tuples_fetched;
num_seeks = tuples_scanned;
num_nexts = (max_nexts_to_avoid_seek + 1) * tuples_scanned;

path->yb_plan_info.estimated_num_nexts = num_nexts;
path->yb_plan_info.estimated_num_seeks = num_seeks;
Expand All @@ -7539,12 +7657,21 @@ yb_cost_bitmap_table_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,

/* Network latency cost is added to startup cost */
startup_cost += yb_local_latency_cost;
run_cost += yb_compute_result_transfer_cost(remote_filtered_rows,
run_cost += yb_compute_result_transfer_cost(tuples_fetched,
docdb_result_width);
/* Local filter costs */
cost_qual_eval(&qual_cost, local_clauses, root);
startup_cost += qual_cost.startup;
run_cost += qual_cost.per_tuple * tuples_fetched;

path->rows =
clamp_row_est(baserel->tuples *
clauselist_selectivity(root, baserel->baserestrictinfo,
baserel->relid, JOIN_INNER, NULL));

path->rows = tuples_fetched;
path->startup_cost = startup_cost;
path->total_cost = startup_cost + run_cost;
path->startup_cost = startup_cost * YB_BITMAP_DISCOURAGE_MODIFIER;
path->total_cost = (startup_cost + run_cost) * YB_BITMAP_DISCOURAGE_MODIFIER;
path->yb_plan_info.estimated_num_nexts = num_nexts;
path->yb_plan_info.estimated_num_seeks = num_seeks;
yb_parallel_cost(path);
Expand Down
3 changes: 3 additions & 0 deletions src/postgres/src/backend/optimizer/path/indxpath.c
Original file line number Diff line number Diff line change
Expand Up @@ -2154,6 +2154,9 @@ yb_bitmap_scan_cost_est(PlannerInfo *root, RelOptInfo *rel, Path *ipath)
bpath.path.pathkeys = NIL;
bpath.bitmapqual = ipath;

/* required for yb_parallel_cost */
bpath.path.parallel_aware = false;

/*
* Check the cost of temporary path without considering parallelism.
* Parallel bitmap heap path will be considered at later stage.
Expand Down
Loading

0 comments on commit 697ecb5

Please sign in to comment.