Skip to content

Commit

Permalink
[#21198] YSQL: Pushdown Aggregates on YB Bitmap Table Scans
Browse files Browse the repository at this point in the history
Summary:
Following the same pattern as 55eba85 / D27427, support aggregate pushdown for YB Bitmap Table Scans.

Aggregates cannot be pushed down at the YB Bitmap Index Scan level, because we need to be sure that we only accumulate each row one, but a row might exist in each Bitmap Index Scan.

Normal index scans can't pushdown the aggregate if there are recheck conditions, because those conditions are implicitly rechecked locally. However, if the condition of a Bitmap Index Scan needs to be rechecked, it is rechecked at the Bitmap Table Scan level. Usually this can happen with a remote filter. The only times we can't push down an aggregate is if we have local filters of any type:
* basic local quals are accounted for by the existing aggregate pushdown framework.
* recheck local quals are only an issue if recheck is required.
* local quals used for when we exceed work_mem are only an issue if we exceed work_mem. However, since its impossible to know if we will or not until the bitmap scans complete, we just avoid aggregate pushdown if there are any local quals of this variety.

Determining if recheck is required can be done at the Init phase. For aggregate workflows, this works, because aggregates do not occur inside loops, so we are not concerned about recheck. For non-aggregate workflows, we need to recompute if recheck is required for each rescan. Some tests were added to validate this.

To access the `recheck_required` field at init time, we need a `YbGetBitmapScanRecheckRequired` that traverses the Bitmap tree. We might as well use the same approach to determine if recheck is required by any of the rescans that occurred. This allows `recheck` to be removed from `YbTIDBitmap`.
Jira: DB-10128

Test Plan:
```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressAggregates'
```

Tested with randgen using a slightly modified grammar to increase the number of aggregates produced.

Reviewers: amartsinchyk, jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D35787
  • Loading branch information
timothy-e committed Jul 3, 2024
1 parent e2e9f93 commit b8fedf6
Show file tree
Hide file tree
Showing 21 changed files with 1,664 additions and 187 deletions.
4 changes: 2 additions & 2 deletions src/postgres/src/backend/access/index/indexam.c
Original file line number Diff line number Diff line change
Expand Up @@ -817,7 +817,7 @@ index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap)
* ----------------
*/
int64
yb_index_getbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm, bool recheck)
yb_index_getbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm)
{
int64 ntids;

Expand All @@ -827,7 +827,7 @@ yb_index_getbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm, bool recheck)
/*
* have the am's getbitmap proc do all the work.
*/
ntids = scan->indexRelation->rd_amroutine->yb_amgetbitmap(scan, ybtbm, recheck);
ntids = scan->indexRelation->rd_amroutine->yb_amgetbitmap(scan, ybtbm);

pgstat_count_index_tuples(scan->indexRelation, ntids);

Expand Down
8 changes: 1 addition & 7 deletions src/postgres/src/backend/access/yb_access/yb_lsm.c
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ ybcinmightrecheck(Relation heap, Relation index, bool xs_want_itup,
}

static int64
ybcgetbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm, bool recheck)
ybcgetbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm)
{
size_t new_tuples = 0;
SliceVector ybctids;
Expand All @@ -334,15 +334,9 @@ ybcgetbitmap(IndexScanDesc scan, YbTIDBitmap *ybtbm, bool recheck)
if (!ybscan->is_exec_done)
pgstat_count_index_scan(scan->indexRelation);

/* Special case: aggregate pushdown. */
if (scan->yb_aggrefs)
elog(ERROR, "TODO: Handle aggregate pushdown");

if (ybscan->quit_scan || ybtbm->work_mem_exceeded)
return 0;

ybtbm->recheck |= recheck;

HandleYBStatus(YBCPgRetrieveYbctids(ybscan->handle, ybscan->exec_params,
ybscan->target_desc->natts, &ybctids, &new_tuples,
&exceeded_work_mem));
Expand Down
33 changes: 27 additions & 6 deletions src/postgres/src/backend/executor/nodeAgg.c
Original file line number Diff line number Diff line change
Expand Up @@ -1573,6 +1573,7 @@ yb_agg_pushdown_supported(AggState *aggstate)
if (!(IsA(outerPlanState(aggstate), ForeignScanState) ||
IsA(outerPlanState(aggstate), IndexOnlyScanState) ||
IsA(outerPlanState(aggstate), IndexScanState) ||
IsA(outerPlanState(aggstate), YbBitmapTableScanState) ||
IsA(outerPlanState(aggstate), YbSeqScanState)))
return;
ss = (ScanState *) outerPlanState(aggstate);
Expand Down Expand Up @@ -1601,6 +1602,25 @@ yb_agg_pushdown_supported(AggState *aggstate)
if (ioss->yb_ioss_might_recheck)
return;
}
else if (IsA(ss, YbBitmapTableScanState))
{
YbBitmapTableScanState *btss = castNode(YbBitmapTableScanState, ss);

/*
* We can pushdown recheck conditions, so the only time we can't
* pushdown the aggregate is if we have local recheck conditions
*/
if (btss->recheck_required && btss->recheck_local_quals)
return;

/*
* We can't pushdown the aggregate if we'd have local conditions to
* check if we exceed work_mem. We don't yet know whether or not
* work_mem will be exceeded, but we must assume the worst.
*/
if (btss->fallback_local_quals)
return;
}

check_outer_plan = false;

Expand Down Expand Up @@ -2606,16 +2626,17 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
outerPlan = outerPlan(node);

/*
* For YB IndexScan/IndexOnlyScan outer plan, we need to collect recheck
* information, so set that eflag. Ideally, the flag is only set for YB
* relations since, later on, agg pushdown is disabled anyway for non-YB
* relations, but we don't have that information at this point: the
* relation is opened in the IndexScan/IndexOnlyScan node. So set the flag
* For YB IndexScan/IndexOnlyScan/BitmapScan outer plan, we need to collect
* recheck information, so set that eflag. Ideally, the flag is only set
* for YB relations since, later on, agg pushdown is disabled anyway for
* non-YB relations, but we don't have that information at this point: the
* relation is opened in the child node. So set the flag
* in all cases, and move the YB-relation check down there.
*/
int yb_eflags = 0;
if (IsYugaByteEnabled() &&
(IsA(outerPlan, IndexScan) || IsA(outerPlan, IndexOnlyScan)))
(IsA(outerPlan, IndexScan) || IsA(outerPlan, IndexOnlyScan) ||
IsA(outerPlan, YbBitmapTableScan)))
yb_eflags |= EXEC_FLAG_YB_AGG_PARENT;

outerPlanState(aggstate) = ExecInitNode(outerPlan, estate,
Expand Down
47 changes: 35 additions & 12 deletions src/postgres/src/backend/executor/nodeYbBitmapIndexscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,6 @@ MultiExecYbBitmapIndexScan(YbBitmapIndexScanState *node)
IndexScanDesc scandesc;
double nTuples = 0;
bool doscan;
bool recheck;
YbScanDesc ybscan;
EState *estate = node->ss.ps.state;

/* must provide our own instrumentation support */
Expand Down Expand Up @@ -157,18 +155,12 @@ MultiExecYbBitmapIndexScan(YbBitmapIndexScanState *node)
bitmap = yb_tbm_create(work_mem * 1024L);
}

ybscan = (YbScanDesc) scandesc->opaque;
recheck = YbPredetermineNeedsRecheck(ybscan->relation, ybscan->index,
true /* xs_want_itup */,
node->biss_ScanKeys,
node->biss_NumScanKeys);

/*
* Get TIDs from index and insert into bitmap
*/
while (doscan)
{
nTuples += (double) yb_index_getbitmap(scandesc, bitmap, recheck);
nTuples += (double) yb_index_getbitmap(scandesc, bitmap);

CHECK_FOR_INTERRUPTS();

Expand Down Expand Up @@ -236,6 +228,24 @@ ExecReScanYbBitmapIndexScan(YbBitmapIndexScanState *node)
yb_init_bitmap_index_scandesc(node);
}

/*
* Bitmap Index aggregate pushdown currently cannot support recheck,
* and this should have been prevented by earlier logic.
*/
Assert(!node->biss_ScanDesc->yb_aggrefs);

/*
* Rescans have different scan keys, so must check again if recheck is
* required. For example, comparing an int4 and an int8 works if the int8
* fits into an int4, but requires recheck if it doesn't.
*/
node->biss_requires_recheck |= YbPredetermineNeedsRecheck(
node->biss_ScanDesc->heapRelation,
node->biss_RelationDesc,
true /* xs_want_itup */,
node->biss_ScanKeys,
node->biss_NumScanKeys);

/* reset index scan */
if (node->biss_RuntimeKeysReady)
index_rescan(node->biss_ScanDesc,
Expand Down Expand Up @@ -301,6 +311,7 @@ ExecInitYbBitmapIndexScan(YbBitmapIndexScan *node, EState *estate, int eflags)
YbBitmapIndexScanState *indexstate;
bool relistarget;
Relation index;
Relation relation;

/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
Expand Down Expand Up @@ -345,8 +356,11 @@ ExecInitYbBitmapIndexScan(YbBitmapIndexScan *node, EState *estate, int eflags)
* If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
* here. This allows an index-advisor plugin to EXPLAIN a plan containing
* references to nonexistent indexes.
*
* However, if we are doing an aggregate, we need to collect more
* information to determine if the Bitmap Index Scan will require a recheck.
*/
if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
if (eflags & EXEC_FLAG_EXPLAIN_ONLY && !(eflags & EXEC_FLAG_YB_AGG_PARENT))
return indexstate;

/*
Expand All @@ -359,8 +373,8 @@ ExecInitYbBitmapIndexScan(YbBitmapIndexScan *node, EState *estate, int eflags)
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
index = index_open(node->indexid, relistarget ? NoLock : AccessShareLock);
indexstate->ss.ss_currentRelation =
ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
relation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
indexstate->ss.ss_currentRelation = relation;
Assert(IsYBRelation(index));

/*
Expand All @@ -385,6 +399,15 @@ ExecInitYbBitmapIndexScan(YbBitmapIndexScan *node, EState *estate, int eflags)
&indexstate->biss_ArrayKeys,
&indexstate->biss_NumArrayKeys);

indexstate->biss_requires_recheck = YbPredetermineNeedsRecheck(relation, index,
true /* xs_want_itup */,
indexstate->biss_ScanKeys,
indexstate->biss_NumScanKeys);

/* Got the info for aggregate pushdown. EXPLAIN can return now. */
if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
return indexstate;

/*
* If we have runtime keys or array keys, we need an ExprContext to
* evaluate them. We could just create a "standard" plan node exprcontext,
Expand Down
28 changes: 26 additions & 2 deletions src/postgres/src/backend/executor/nodeYbBitmapTablescan.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "access/relscan.h"
#include "executor/executor.h"
#include "executor/nodeYbBitmapTablescan.h"
#include "nodes/nodeFuncs.h"
#include "utils/rel.h"
#include "utils/tqual.h"

Expand Down Expand Up @@ -73,8 +74,26 @@ YbBitmapTableNext(YbBitmapTableScanState *node)
node->initialized = true;
node->work_mem_exceeded = ybtbm->work_mem_exceeded;
node->average_ybctid_bytes = yb_tbm_get_average_bytes(ybtbm);
node->recheck_required |= ybtbm->recheck;
node->skipped_tuples = 0;
node->recheck_required =
YbGetBitmapScanRecheckRequired(outerPlanState(node));

if (node->aggrefs)
{
/*
* For aggregate pushdown, we read just the aggregates from DocDB
* and pass that up to the aggregate node (agg pushdown wouldn't be
* enabled if we needed to read more than that). Set up a dummy
* scan slot to hold as many attributes as there are pushed
* aggregates.
*/
TupleDesc tupdesc = CreateTemplateTupleDesc(
list_length(node->aggrefs), false /* hasoid */);
ExecInitScanTupleSlot(node->ss.ps.state, &node->ss, tupdesc);

/* Refresh the local pointer. */
slot = node->ss.ss_ScanTupleSlot;
}
}

if (!node->ss.ss_currentScanDesc)
Expand Down Expand Up @@ -253,7 +272,7 @@ CreateYbBitmapTableScanDesc(YbBitmapTableScanState *scanstate)
(Scan *) &plan /* pg_scan_plan */,
yb_pushdown /* rel_pushdown */,
NULL /* idx_pushdown */,
NULL /* aggrefs */,
scanstate->aggrefs /* aggrefs */,
0 /* distinct_prefixlen */,
&scanstate->ss.ps.state->yb_exec_params,
true /* is_internal_scan */,
Expand Down Expand Up @@ -471,6 +490,11 @@ ExecInitYbBitmapTableScan(YbBitmapTableScan *node, EState *estate, int eflags)

scanstate->ss.ss_currentRelation = currentRelation;

/*
* We can already tell if we need to recheck index qual conditions.
*/
scanstate->recheck_required = YbGetBitmapScanRecheckRequired(outerPlanState(scanstate));

/*
* all done.
*/
Expand Down
2 changes: 1 addition & 1 deletion src/postgres/src/backend/executor/nodeYbSeqscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ YbSeqNext(YbSeqScanState *node)
* For aggregate pushdown, we read just the aggregates from DocDB
* and pass that up to the aggregate node (agg pushdown wouldn't be
* enabled if we needed to read more than that). Set up a dummy
* scan slot to hold that as many attributes as there are pushed
* scan slot to hold as many attributes as there are pushed
* aggregates.
*/
TupleDesc tupdesc =
Expand Down
3 changes: 1 addition & 2 deletions src/postgres/src/backend/executor/ybc_fdw.c
Original file line number Diff line number Diff line change
Expand Up @@ -409,8 +409,7 @@ ybcSetupScanTargets(ForeignScanState *node)
* For aggregate pushdown, we read just the aggregates from DocDB
* and pass that up to the aggregate node (agg pushdown wouldn't be
* enabled if we needed to read more than that). Set up a dummy
* scan slot to hold that as many attributes as there are pushed
* aggregates.
* scan slot to hold as many attributes as there are pushed aggregates.
*/
TupleDesc tupdesc =
CreateTemplateTupleDesc(list_length(node->yb_fdw_aggrefs),
Expand Down
32 changes: 32 additions & 0 deletions src/postgres/src/backend/nodes/nodeFuncs.c
Original file line number Diff line number Diff line change
Expand Up @@ -4303,7 +4303,39 @@ YbPlanStateTryGetAggrefs(PlanState *ps)
return &castNode(IndexScanState, ps)->yb_iss_aggrefs;
case T_YbSeqScanState:
return &castNode(YbSeqScanState, ps)->aggrefs;
case T_YbBitmapTableScanState:
return &castNode(YbBitmapTableScanState, ps)->aggrefs;
default:
return NULL;
}
}

bool
YbGetBitmapScanRecheckRequired(PlanState *ps)
{
switch (nodeTag(ps))
{
case T_BitmapOrState:
{
BitmapOrState *bos = castNode(BitmapOrState, ps);
for (int i = 0; i < bos->nplans; i++)
if (YbGetBitmapScanRecheckRequired(bos->bitmapplans[i]))
return true;
return false;
}
case T_BitmapAndState:
{
BitmapAndState *bas = castNode(BitmapAndState, ps);
for (int i = 0; i < bas->nplans; i++)
if (YbGetBitmapScanRecheckRequired(bas->bitmapplans[i]))
return true;
return false;
}
case T_YbBitmapIndexScanState:
return castNode(YbBitmapIndexScanState, ps)->biss_requires_recheck;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(ps));
break;
}
}
2 changes: 1 addition & 1 deletion src/postgres/src/backend/utils/misc/guc.c
Original file line number Diff line number Diff line change
Expand Up @@ -7741,7 +7741,7 @@ set_config_option(const char *name, const char *value,
if (source == YSQL_CONN_MGR)
Assert(YbIsClientYsqlConnMgr());

/*
/*
* role_oid and session_authorization_oid are provisions made for YSQL
* Connection Manager to handle scenarios around "ALTER ROLE RENAME"
* queries as it only caches the previously used role by that client.
Expand Down
3 changes: 1 addition & 2 deletions src/postgres/src/include/access/amapi.h
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,7 @@ typedef int64 (*amgetbitmap_function) (IndexScanDesc scan,

/* YB: fetch all valid tuples */
typedef int64 (*yb_amgetbitmap_function) (IndexScanDesc scan,
YbTIDBitmap *ybtbm,
bool recheck);
YbTIDBitmap *ybtbm);

typedef void (*yb_ambindschema_function) (YBCPgStatement handle,
struct IndexInfo *indexInfo,
Expand Down
3 changes: 1 addition & 2 deletions src/postgres/src/include/access/genam.h
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,7 @@ extern ItemPointer index_getnext_tid(IndexScanDesc scan,
extern HeapTuple index_fetch_heap(IndexScanDesc scan);
extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
extern int64 yb_index_getbitmap(IndexScanDesc scan, YbTIDBitmap *bitmap,
bool recheck);
extern int64 yb_index_getbitmap(IndexScanDesc scan, YbTIDBitmap *bitmap);

extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
Expand Down
2 changes: 2 additions & 0 deletions src/postgres/src/include/nodes/execnodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1520,6 +1520,7 @@ typedef struct YbBitmapIndexScanState
ExprContext *biss_RuntimeContext;
Relation biss_RelationDesc;
IndexScanDesc biss_ScanDesc;
bool biss_requires_recheck;
} YbBitmapIndexScanState;

/* ----------------
Expand Down Expand Up @@ -1644,6 +1645,7 @@ typedef struct YbBitmapTableScanState
bool work_mem_exceeded;
size_t average_ybctid_bytes;
int skipped_tuples;
List *aggrefs; /* aggregate pushdown information */
} YbBitmapTableScanState;

/* ----------------
Expand Down
2 changes: 2 additions & 0 deletions src/postgres/src/include/nodes/nodeFuncs.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,4 +84,6 @@ extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) ()
/* YB additions. */
extern List **YbPlanStateTryGetAggrefs(struct PlanState *planstate);

extern bool YbGetBitmapScanRecheckRequired(struct PlanState *planstate);

#endif /* NODEFUNCS_H */
1 change: 0 additions & 1 deletion src/postgres/src/include/nodes/ybtidbitmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ typedef struct YbTIDBitmap
YbTBMIteratingState iterating PG_USED_FOR_ASSERTS_ONLY;
/* yb_tbm_begin_iterate called? */
size_t bytes_consumed; /* sum of the size of the ybctids */
bool recheck; /* recheck all entries in this set? */
bool work_mem_exceeded; /* if bytes_consumed exceeds work_mem */
} YbTIDBitmap;

Expand Down
Loading

0 comments on commit b8fedf6

Please sign in to comment.