Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(subscription): add metrics for cursor #18052

Merged
merged 13 commits into from
Sep 20, 2024
Merged

Conversation

xxhZs
Copy link
Contributor

@xxhZs xxhZs commented Aug 15, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

#17992
1723705089290

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

add

add

add
@xxhZs xxhZs force-pushed the xxh/add-mertric-for-cursor branch from f036461 to 5966c6c Compare August 15, 2024 07:08
@xxhZs xxhZs requested a review from hzxa21 August 19, 2024 07:03
grafana/risingwave-dev-dashboard.dashboard.py Outdated Show resolved Hide resolved
src/frontend/src/session/cursor_manager.rs Outdated Show resolved Hide resolved
grafana/risingwave-dev-dashboard.dashboard.py Outdated Show resolved Hide resolved
grafana/risingwave-dev-dashboard.dashboard.py Outdated Show resolved Hide resolved
grafana/risingwave-dev-dashboard.dashboard.py Outdated Show resolved Hide resolved
src/frontend/src/session/cursor_manager.rs Outdated Show resolved Hide resolved
src/frontend/src/session/cursor_manager.rs Outdated Show resolved Hide resolved
src/frontend/src/session/cursor_manager.rs Outdated Show resolved Hide resolved
@@ -452,6 +486,7 @@ impl SubscriptionCursor {
}
}
}
self.last_fetch = Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will miss when next returns early in L449 and L449

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this cursor should be closed after 449 is returned, set last_fetch doesn't seem to make sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this cursor should be closed after 449 is returned, set last_fetch doesn't seem to make sense

I think the cursor will not be closed but will be set to invalid state and remain in the cursor map. Actually after walking through the codes again, I found out that if initiate_query or try_refill_remaining_rows return an error, cursor state will not be set to invalid so I am thinking whether we should set the state to invalid in next instead of next_row to make sure the all errors are fully covered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can further improve the error handling via one of the following ideas:

  1. Auto close the cursor if the cursor is in an invalid state.
  2. Auto retry/recreate the query stream / RPC when meeting errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do it in next pr

Comment on lines 89 to 95
pub valid_subsription_cursor_nums: IntGauge,
pub invalid_subsription_cursor_nums: IntGauge,
pub subscription_cursor_error_count: GenericCounter<AtomicU64>,
pub subscription_cursor_query_duration: HistogramVec,
pub subscription_cursor_declare_duration: HistogramVec,
pub subscription_cursor_fetch_duration: HistogramVec,
pub subscription_cursor_last_fetch_duration: HistogramVec,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is simpler and less error prone if we use Collector to periodically report the following metrics instead of updating them in cursor manager in an ad-hoc manner:

  • valid_subsription_cursor_nums
  • invalid_subsription_cursor_nums
  • subscription_cursor_last_fetch_duration

You can refer to the StateStoreCollector as an exmaple.

@xxhZs xxhZs requested a review from chenzl25 August 27, 2024 09:15
src/frontend/src/monitor/stats.rs Outdated Show resolved Hide resolved
Cursor::Subscription(cursor) => cursor.next(count, handle_args, formats).await,
Cursor::Subscription(cursor) => {
cursor.next(count, handle_args, formats).await.map_err(|e| {
cursor.cursor_metrics.subscription_cursor_error_count.inc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about updating the metric in fetch_cursor.rs just like what we did in declare_cursor.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't distinguish between cursor and subscription cursor until next(), we can only do it here. So we cannot correctly add to the subscription_cursor_fetch_err counts in fetch_cursor.rs

let row = self.next_row(&handle_args, formats).await?;
self.cursor_metrics
.subscription_cursor_fetch_duration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric here is not measuring the whole fetch duration. How about doing it in fetch_cursor.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may fetch n rows at a time, which in fetch_cursor.rs statistics may lead to inconsistent metrics and less overhead for methods other than next_row()

@@ -452,6 +486,7 @@ impl SubscriptionCursor {
}
}
}
self.last_fetch = Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this cursor should be closed after 449 is returned, set last_fetch doesn't seem to make sense

I think the cursor will not be closed but will be set to invalid state and remain in the cursor map. Actually after walking through the codes again, I found out that if initiate_query or try_refill_remaining_rows return an error, cursor state will not be set to invalid so I am thinking whether we should set the state to invalid in next instead of next_row to make sure the all errors are fully covered.

@@ -452,6 +486,7 @@ impl SubscriptionCursor {
}
}
}
self.last_fetch = Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can further improve the error handling via one of the following ideas:

  1. Auto close the cursor if the cursor is in an invalid state.
  2. Auto retry/recreate the query stream / RPC when meeting errors

src/frontend/src/session/cursor_manager.rs Outdated Show resolved Hide resolved
@xxhZs xxhZs enabled auto-merge September 19, 2024 10:43
@xxhZs xxhZs force-pushed the xxh/add-mertric-for-cursor branch from 032c181 to 704e6e4 Compare September 20, 2024 04:07
fix ci

fix
@xxhZs xxhZs force-pushed the xxh/add-mertric-for-cursor branch from 704e6e4 to 4cbb0e1 Compare September 20, 2024 06:57
@xxhZs xxhZs added this pull request to the merge queue Sep 20, 2024
Merged via the queue into main with commit 3c74390 Sep 20, 2024
29 of 30 checks passed
@xxhZs xxhZs deleted the xxh/add-mertric-for-cursor branch September 20, 2024 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants