-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26913 Replication Observability Framework #4556
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. PTAL.
In general I think the approach is OK, as it will not affect the normal write/replication flow too much.
There are several general questions:
- How to deal with multi wal?
- The named queue service is not designed for storing critical data, it is OK for us to drop all the records in it. This is the initial design of this stuff. Is it also OK for our usage here?
Thanks.
@@ -1552,6 +1552,14 @@ public enum OperationStatusCode { | |||
"hbase.regionserver.slowlog.systable.enabled"; | |||
public static final boolean DEFAULT_SLOW_LOG_SYS_TABLE_ENABLED_KEY = false; | |||
|
|||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not adding more things to HConstants, please move then to a more specific location.
And why adding a deprecated field here? Just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why adding a deprecated field here? Just remove it?
We are replacing hbase.slowlog.systable.chore.duration
with hbase.regionserver.named.queue.chore.duration
. Since the old config property already exists, we will have to follow deprecation cycle.
Better not adding more things to HConstants, please move then to a more specific location.
Sounds good.
@@ -57,6 +58,7 @@ public interface NamedQueueService { | |||
/** | |||
* Add all in memory queue records to system table. The implementors can use system table or | |||
* direct HDFS file or ZK as persistence system. | |||
* @param connection connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: remove useless param block
hbase-server/src/main/java/org/apache/hadoop/hbase/namequeues/NamedQueueServiceChore.java
Show resolved
Hide resolved
public final class ReplicationSinkTrackerTableCreator { | ||
private static final Logger LOG = | ||
LoggerFactory.getLogger(ReplicationSinkTrackerTableCreator.class); | ||
private static final Long TTL = TimeUnit.DAYS.toSeconds(365); // 1 year in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long -> long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using TTL.intValue below and that doesn't work with long primitive. We can always have the following statement but using TimeUnit util methods is more readable. Let me know if you want me to change to below.
private static final long TTL = 365 * 24 * 60; // 1 year in seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of the toSeconds method is long, not Long? For TTL.intValue, just cast it to int is enough?
public static final byte[] OFFSET_COLUMN = Bytes.toBytes("offset"); | ||
|
||
/** Will create {@link #REPLICATION_SINK_TRACKER_TABLE_NAME_STR} table if this conf is enabled **/ | ||
public static final String REPLICATION_SINK_TRACKER_ENABLED_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have this config off at master but on at region server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use this end-to-end feature, we will need 2 releases. First release will enable this config property in whole cluster, hmaster and region server.
In the second release, we will need to enable ReplicationMarkerChore which will create the special marker rows.
I have also add this note in the hbase book.
There will be a problem if hbase.regionserver.replication.sink.tracker.enabled
is enabled on regionserver and disabled on hmaster and ReplicationMarkerChore is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then better mention this in the ref guide. For example, 'when enabling this feature, you should upgrade master first, and then region servers'. Something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first phase/release, it doesn't matter whether you upgrade master or regionserver first as long as the chore that is creating these marker rows ReplicationMarkerChore
is enabled in next phase/release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 Do you still think we need changes in the ref guide or can we resolve this conversation ?
LOG.trace("Creating replication marker edit."); | ||
} | ||
try { | ||
WALUtil.writeReplicationMarkerAndSync(wal, MVCC, REGION_INFO, rowKey, timeStamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we use a hard coded region info. There are mainly two problems:
- Ned to make sure that the system table is never split. Checked the creator code above, seems we do not set a special split policy for the system table?
- If we enable multi wal, then here we can only add markers to one WAL group, the WAL files in other group will not have markers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ned to make sure that the system table is never split. Checked the creator code above, seems we do not set a special split policy for the system table?
Didn't understand this comment. We don't persist this marker row in the source cluster. We replicate this marker row and persist it in the sink cluster. On the sink side (ReplicationSink.java), we just extract the table name from this REGION_INFO object.
If we enable multi wal, then here we can only add markers to one WAL group, the WAL files in other group will not have markers.
I haven't tested this feature with multi wal implementation. Can we work on enabling this feature for multi wal once we merge this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only use table name in this method, then better just pass a table name in. The use of a hard coded REGION_INFO implies that there is only one region for this table. In HBase, we hard coded the meta region info, RegionInfoBuilder.FIRST_META_REGIONINFO. So if the table is splittable, then here let's not use hard coded region info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the multi wal support, we do not need to implement it in this PR. But let's at least discuss about the design and make sure that we do not need to do breaking changes when we want to add the support later. And also, if the feature will be released before we add the multi wal support, we need to find a way to make sure that user can not enable multi wal and this feature together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 thanks for your inputs on this thread. Could you provide an example of what you think would not be considered a hack for the current design? Do you think everything about the marker (not bound to a region) needs to change or something specific here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new class called NoRegionWALEdit
to create an edit which is not bound to any region. Within WALCoprocessorHost#preWrite
and WALCoprocessorHost#postWrite
methods we can check if the edit is an instance of NoRegionWALEdit
. If yes then we can bypass these coproc hooks and we can add a this check in custom co-procs also. See commit here.
@Apache9 Would this be sufficient to move this PR forward. If not, then please provide an actionable feedback. I would like to make progress on this PR. Thank you !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we go back to the start of the tech design part. Here the problem is we write a WAL edit which belongs to a region which is not on the region server, we can hardly say this is a valid operation.
For me, I think there are two choices.
First, we could try to make HBase accept WAL edits which do not belong to any regions. For example, we introduce a new WAL type called 'ServerLevelMarker', give an overall design on how we plan to deal with these WAL edits in HBase, and then we use this 'ServerLevelMarker' to implement the feature here.
Second, we could try to attach our markers to valid WAL edits, resue the METAFAMILY mechanism to filter them out and not let them mess up the normal WAL processing.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you you @Apache9 for your reply. I was busy with some other high priority work at my day job so couldn't find time to reply here or update the patch.
Second, we could try to attach our markers to valid WAL edits, resue the METAFAMILY mechanism to filter them out and not let them mess up the normal WAL processing.
I like this suggestion where ReplicationMarkerChore will choose one region at random among the regions it hosts. It creates a replication marker edit with that region. We reuse the METAFAMILY mechanism to filter them out at various places (like ReplicationSourceWALActionsListener and replication related code) and replicate it to the sink cluster. Added a new commit here d497720
Can you please review if this approach is clean ? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 Can you please review the above approach? Thank you.
...r/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationMarkerChore.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@Apache9 Do you think you might have some bandwidth sometime this week or so to review this PR? |
Will take a look soon. And do we have a new design doc? IIRC, the previous problem is how to make our WAL accept an entry which is not belonged to any regions, or we attach the special WAL entry to an existing entry. Thanks. |
@Apache9 I have updated this design doc especially #2 in the design proposal. |
@Apache9 Can you please review the above design proposal? Thank you. |
Let me take a look~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, using random region and skip it when splitting seems OK.
Please fill follow on issue to add support for multi WAL.
@@ -138,11 +143,13 @@ public void run() { | |||
} | |||
batch = tryAdvanceStreamAndCreateWALBatch(entryStream); | |||
if (batch == null) { | |||
LOG.info("RSS batch null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in latest commit.
private final Configuration conf; | ||
private final RegionServerServices rsServices; | ||
private WAL wal; | ||
Random random = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use ThreadLocalRandom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to ThreadLocalRandom in latest commit.
protected void chore() { | ||
if (wal == null) { | ||
try { | ||
wal = rsServices.getWAL(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is a TODO for adding support for multi WAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO in the code and will open a followup jira.
/** | ||
* Creates a rowkey with region server name and timestamp. | ||
* @param serverName region server name | ||
* @param timestamp timestamp n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the last n? It should a bug of our old spotless rule...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in latest commit
Thank you @Apache9 for your review.
Created HBASE-27461 for tracking this. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I've posted my +1 on the mailing list. Please close the vote and I will merge this PR to master. And I believe you guys also want this feature on branch-2? Then please start the backport work :) Thanks. |
Thank you @Apache9 for the feedback and review. Thank you @virajjasani @apurtell for the review. |
No description provided.