-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-25881: Create a chore to update age related metrics #3518
base: master
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Still have to look into test failure. @shahrs87 @anoopsjohn wondering if you guys can have a look and provide feedback about the approach if any. Thanks |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@virajjasani @apurtell @anoopsjohn Looking for review on this PR. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did u test it in a real write heavy replicating cluster setup? How the metric reporting ? Provided now we will update the metric in every 1 min (default). On write heavy, this may look slower?
*/ | ||
@InterfaceAudience.Private | ||
@InterfaceStability.Evolving | ||
public class MetricsReplicationSourceRefresherChore extends ScheduledChore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReplicationSourceMetricsRefresherChore a better name?
*/ | ||
private long getOldestWalTimestamp() { | ||
long oldestWalTimestamp = Long.MAX_VALUE; | ||
for (Map.Entry<String, PriorityBlockingQueue<Path>> entry : this.replicationSource.getQueues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterate over valueSet
|
||
private MetricsSource metrics; | ||
|
||
public static final String DURATION = "hbase.metrics.replication.source.refresher.duration"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also..
better name would be hbase.replication.source.metrics.refresher.duration
Or even
hbase.replication.source.metrics.refresher.period ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahulLiving Thank you for the patch. Added some comments.
public class MetricsReplicationSourceRefresherChore extends ScheduledChore { | ||
|
||
private ReplicationSource replicationSource; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra lines.
@@ -224,6 +224,10 @@ public void init(Configuration conf, FileSystem fs, ReplicationSourceManager man | |||
this.abortOnError = this.conf.getBoolean("replication.source.regionserver.abort", | |||
true); | |||
|
|||
int duration = this.conf.getInt(MetricsReplicationSourceRefresherChore.DURATION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we creating a chore for every replication source ? If we have 2-3 peers, this will create that many chores. Instead is it possible to create just 1 chore and initialize with ReplicationSourceManager which has list of all replicationSources and just iterate over them every time this chore wakes up. Thoughts ?
/* | ||
* Returns the age of oldest wal. | ||
*/ | ||
long getOldestWalAge() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahulLiving There are many age
related metrics in Replication subsytem. We want to update all the metrics from this Chore. I don't think it is a good idea to move the implementation from ReplicationSourceLogQueue to here.
What I had in my mind is: Compute oldestWalTimestamp
(not the age, just the timestamp) in ReplicationSourceLogQueue
and keep it as a class variable. Don't calculate the age there. Instead calculate the age in this chore. So for some reason if Replication thread is stuck, it will not update oldestWalTimestamp
and since age is calculated in this chore, it will be updated. Does that make sense ?
@@ -662,12 +666,16 @@ public void testAgeOfOldestWal() throws Exception { | |||
manualEdge.setValue(10); | |||
// Diff of current time (10) and log-walgroup-a.8 timestamp will be 2. | |||
source.enqueueLog(log1); | |||
// Sleep for chore to update WAL age | |||
Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Waiter.waitFor method to wait on some condition. That way if the jenkins machine are slow, it will not create flaky tests.
@rahulLiving Checking whether you are still interested in working on this patch ? If not, then I can carry forward the work that you have done until now. Thank you ! |
Hi @shahrs87 , sorry about not being able to prioritise the above review comments. Unfortunately, won't be able to focus on this PR for current month at least. Please feel free to take it up if this has to be prioritised. Thanks |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
No description provided.