-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not allocate task memory for VALUES/metadata-only queries #19027
Conversation
9a36a82
to
aae893b
Compare
} | ||
|
||
// If fragment source is not reading any external tables or only accesses information_schema assume it does not need significant amount of memory. | ||
// Allow scheduling even if whole server memory is pre allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
to be precise, some memory is allocated (list of tables, columns of view definitions) |
core/trino-main/src/main/java/io/trino/execution/scheduler/BinPackingNodeAllocatorService.java
Outdated
Show resolved
Hide resolved
if (!fragment.getRemoteSourceNodes().stream().flatMap(node -> node.getSourceFragmentIds().stream()) | ||
.allMatch(sourceFragmentId -> stageExecutions.get(getStageId(sourceFragmentId)).isNoMemoryFragment())) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remote sources matter?
shouldn't isNoMemoryFragment(PlanFragment)
be a local assessment about the fragment?
is this is about whole execution subtree, let's rename the method to indicate that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is about local fragment. It just uses this bit of information for source stages as a shortcut for "source will produce negligible amount of data". Which is a simplification - but not that much given logic we use to set this bit. I will add a comment.
...-main/src/main/java/io/trino/execution/scheduler/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
// If fragment source is not reading any external tables or only accesses information_schema assume it does not need significant amount of memory. | ||
// Allow scheduling even if whole server memory is pre allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we schedule unbounded number of information_schema queries? there must be some sort of a limit for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have resource queues in place yes. But one can argue this is a mis-setup anyway - and very much possible in streaming mode right now.
StageExecution execution = new StageExecution( | ||
queryStateMachine, | ||
taskDescriptorStorage, | ||
stage, | ||
taskSource, | ||
sinkPartitioningScheme, | ||
exchange, | ||
memoryEstimatorFactory.createPartitionMemoryEstimator(), | ||
noMemoryFragment, | ||
noMemoryFragment ? new NoMemoryPartitionMemoryEstimator() : memoryEstimatorFactory.createPartitionMemoryEstimator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the standard estimator be modified to more adequatly estimate memory usage of information_schema queries?
like
- if source stage and this is information_schema assume 0 memory need and ~0 (negligible) number of rows produced
- for intermediate stage, and sources produce ~0 rows, assume 0 memory need
(actually we should get rid of intermediate stages for DESCRIBE, it's nonsense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having smarter memory estimation in general is a planned item - but much more complex than this fix. And will require some of a restructuring of interfaces. I think this is fine to have this as a stopgap for sake of metadata queries which are a very common outlier even for clusters which tend to run massing ETL workloads.
Or maybe I did not understand your suggestion here. Let's chat tomorrow.
Future<?> nonMetadataQueryFuture = backgroundExecutor.submit(() -> { | ||
query("select count(*) from nation"); | ||
}); | ||
Thread.sleep(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do without sleep (7s total)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to test if query already started introspecting query runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed most (just 1s left)
aae893b
to
44abc47
Compare
...-main/src/main/java/io/trino/execution/scheduler/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
If query does not scan any tables or only accesses information_schema do assume its tasks do not requrie any memory when allocating node for task execution. That way metadata only queries can execute even if cluster is fully pre allocated.
44abc47
to
dd851a7
Compare
If query does not scan any tables or only accesses information_schema do assume its tasks do not requrie any memory when allocating node for task execution.
That way metadata only queries can execute even if cluster is fully pre allocated.
Description
Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: