-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HBASE-25539] Add age of oldest wal metric #2945
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -0,0 +1,93 @@ | |||
package org.apache.hadoop.hbase.replication.regionserver; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack This is still in draft mode. Wanted to run tests first before I submit for review. Thank you !
|
||
@InterfaceAudience.Private | ||
@InterfaceStability.Evolving | ||
public class ReplicationLogQueue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class comment on what this thing does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack This is still in draft mode. Wanted to run tests first before I submit for review. Thank you !
@@ -87,9 +86,10 @@ | |||
private static final Logger LOG = LoggerFactory.getLogger(ReplicationSource.class); | |||
// Queues of logs to process, entry in format of walGroupId->queue, | |||
// each presents a queue for one wal group | |||
private Map<String, PriorityBlockingQueue<Path>> queues = new HashMap<>(); | |||
//private Map<String, PriorityBlockingQueue<Path>> queues = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove rather than comment out (This is a draft so maybe you are just trying stuff -- if so, ignore this comment)
e9acf92
to
af1f76e
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@saintstack This is ready for review now. Please review. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@bharathv @gjacoby126 could you guys also help reviewing this change. Thank you ! |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring, have a few minor comments.
...in/java/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationSourceSource.java
Show resolved
Hide resolved
import org.slf4j.LoggerFactory; | ||
|
||
/* | ||
Class that does enqueueing/dequeueing of wal at one place so that we can update the metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dequeuing typo
// WARN threshold for the number of queued logs, defaults to 2 | ||
private int logQueueWarnThreshold; | ||
private ReplicationSource source; | ||
private static final Logger LOG = LoggerFactory.getLogger(ReplicationSource.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: static finals to the top of the class.
// the shipper may quit immediately | ||
queue.put(wal); | ||
queues.put(walPrefix, queue); | ||
boolean queueAlreadyExisted = logQueue.enqueueLog(wal, walPrefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/queueExists
*/ | ||
public boolean enqueueLog(Path wal, String walGroupId) { | ||
boolean exists = false; | ||
PriorityBlockingQueue<Path> queue = queues.get(walGroupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that this is not thread-safe. It wasn't before the patch too but this seems prone to weird concurrent modification issues. Fix while we are here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahrs87 Missed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed queues
implementation from HashMap
to ConcurrentHashMap
. I thought your concern was regarding queues data structure. But reading the comment again, think you were concerned about PriorityBlockingQueue
within map ?
private static final Logger LOG = LoggerFactory.getLogger(ReplicationSource.class); | ||
|
||
|
||
public ReplicationLogQueue(Configuration conf, MetricsSource metrics, ReplicationSource source) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: thinking out loud, should we rename it to ReplicationSourceLogQueue to better convey that this is per source across all walGroups?
|
||
/* | ||
Get the oldest wal timestamp from all the queues. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
private void setOldestWalAge() { | ||
long now = EnvironmentEdgeManager.currentTime(); | ||
long timestamp = getOldestWalTimestamp(); | ||
// TODO: Should we handle the case where getOldestWalTimestamp returns Long.MAX_VALUE ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we should? Otherwise on empty queue we get false alarms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT there would be atleast one source with active wal at the head of the queue always. In any case add code to handle Long.MAX_VALUE scenario.
*/ | ||
private long getOldestWalTimestamp() { | ||
long oldestWalTimestamp = Long.MAX_VALUE; | ||
for (Map.Entry<String, PriorityBlockingQueue<Path>> entry : queues.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an O(n) loop but should be ok because the no. of walGroups is typically in a few 100s max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is the reason I was ok with O(n) loop.
...er/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSource.java
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@bharathv Tried to address all of your comments. Could you please review again ? Thank you ! |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@bharathv could you please review once again ? Thank you ! |
Testing enqueue and dequeuing of wal and check age of oldest wal. | ||
*/ | ||
@Test | ||
public void testEnqueueDequeue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test includes everything tested by testAgeOfOldestWal() above, get rid of the above one or club them? (unless I'm missing something)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestReplicationSource#testAgeOfOldestWal
brings up an actual cluster and starts a new ReplicationSource so I thought to keep it but enqueuing and dequeuing happens at separate places so it was hard to test dequeue logic in TestReplicationSource.
TestReplicationSourceLogQueue
is an unit test where enqueuing and dequeuing is easy to test but it doesn't bring up any cluster. So I thought to keep both. I don't have any strong opinions if you want me to delete one test. I would prefer to delete TestReplicationSource#testAgeOfOldestWal
since we are testing more stuff in TestReplicationSourceLogQueue.
*/ | ||
public boolean enqueueLog(Path wal, String walGroupId) { | ||
boolean exists = false; | ||
PriorityBlockingQueue<Path> queue = queues.get(walGroupId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahrs87 Missed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahrs87 I think you missed a comment and one test nit, otherwise lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack Do you have any more comments on this?
@shahrs87 There is a minor conflict on this one, mind rebasing and then I can commit. Thanks. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Ran failing test
|
Signed-off-by: Bharath Vissapragada <[email protected]>
Signed-off-by: Bharath Vissapragada <[email protected]>
https://issues.apache.org/jira/browse/HBASE-25539