Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new metric type: Gauge + CurrentMemoryUsage to metrics #1682

Merged
merged 4 commits into from
Jan 26, 2022

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Jan 26, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

A gauge metric is useful for reporting purposes by returning a value each time, and a gauge suits well for memory consumption reports.

What changes are included in this PR?

  1. a new kind of metric: Gauge
  2. add CurrentMemoryUsage to BaselineMetrics
  3. use gauge in sort for memory tracking.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jan 26, 2022
@yjshen yjshen changed the title Add gauge Add a new metric type: Gauge Jan 26, 2022
impl AggregatedMetricsSet {
/// Create a new aggregated set
pub(crate) fn new() -> Self {
pub fn new() -> Self {
Copy link
Member Author

@yjshen yjshen Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To create a memory-managed version of ShuffleWriter, we need to expose these four APIs for dependent crate usage.
I'm not sure this should be in its own PR, or I can have it here since this PR is also metric-related. I can remove this if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to be in this PR

/// Returns the current memory usage for this stream.
fn mem_used(&self) -> usize {
0
}
Copy link
Member Author

@yjshen yjshen Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line adds a mem_used method in our essential RecordBatchStream trait.

A baby step to tracking Non-Limited-Operators' memory usage since I think SendableRecordBatchStream is the fundamental entity that holds memory during execution. However, I didn't quite find a way to register these streams generated during async execute to our memory manager.

I would love to hear your thoughts.

If considered not appropriate, I will remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-consider this for a while. I cannot think of a solution to register RecordBatchStream somewhere else since each stream is used through mutable reference. I don't think there's a way sharing it except for Arc<dyn RecordBatchStream ....>, which we have discussed earlier and is not acceptable.

I'm going to revert this last commit.

@alamb alamb changed the title Add a new metric type: Gauge Add a new metric type: Gauge + CurrentMemoryUsage to metrics Jan 26, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @yjshen

impl AggregatedMetricsSet {
/// Create a new aggregated set
pub(crate) fn new() -> Self {
pub fn new() -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to be in this PR

use std::sync::Arc;
use tempfile::NamedTempFile;
use tokio::sync::mpsc::{Receiver as TKReceiver, Sender as TKSender};
use tokio::sync::mpsc::{Receiver, Sender};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Jan 26, 2022

And thank you for the review @liukun4515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants