Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28680 BackupLogCleaner should clean up archived HMaster logs #6006

Merged
merged 1 commit into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseInterfaceAudience;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.backup.BackupInfo;
Expand All @@ -36,6 +37,7 @@
import org.apache.hadoop.hbase.master.HMaster;
import org.apache.hadoop.hbase.master.MasterServices;
import org.apache.hadoop.hbase.master.cleaner.BaseLogCleanerDelegate;
import org.apache.hadoop.hbase.master.region.MasterRegionFactory;
import org.apache.hadoop.hbase.net.Address;
import org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore;
import org.apache.hadoop.hbase.wal.AbstractFSWALProvider;
Expand Down Expand Up @@ -123,27 +125,8 @@ public Iterable<FileStatus> getDeletableFiles(Iterable<FileStatus> files) {
return Collections.emptyList();
}
for (FileStatus file : files) {
String fn = file.getPath().getName();
if (fn.startsWith(WALProcedureStore.LOG_PREFIX)) {
Comment on lines -126 to -127
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic was already in place for marking fresh HMaster WALs as deletable — I believe we just forgot to consider archived ones too

if (canDeleteFile(addressToLastBackupMap, file.getPath())) {
filteredFiles.add(file);
continue;
}

try {
Address walServerAddress =
Address.fromString(BackupUtils.parseHostNameFromLogFile(file.getPath()));
long walTimestamp = AbstractFSWALProvider.getTimestamp(file.getPath().getName());

if (
!addressToLastBackupMap.containsKey(walServerAddress)
|| addressToLastBackupMap.get(walServerAddress) >= walTimestamp
) {
filteredFiles.add(file);
}
} catch (Exception ex) {
LOG.warn(
"Error occurred while filtering file: {} with error: {}. Ignoring cleanup of this log",
file.getPath(), ex.getMessage());
}
Comment on lines -132 to 130
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this logic to a separate method to improve testability

}

Expand Down Expand Up @@ -176,4 +159,39 @@ public void stop(String why) {
public boolean isStopped() {
return this.stopped;
}

protected static boolean canDeleteFile(Map<Address, Long> addressToLastBackupMap, Path path) {
if (isHMasterWAL(path)) {
return true;
}

try {
String hostname = BackupUtils.parseHostNameFromLogFile(path);
if (hostname == null) {
LOG.warn(
"Cannot parse hostname from RegionServer WAL file: {}. Ignoring cleanup of this log",
path);
return false;
}
Address walServerAddress = Address.fromString(hostname);
long walTimestamp = AbstractFSWALProvider.getTimestamp(path.getName());

if (
!addressToLastBackupMap.containsKey(walServerAddress)
|| addressToLastBackupMap.get(walServerAddress) >= walTimestamp
) {
return true;
}
} catch (Exception ex) {
LOG.warn("Error occurred while filtering file: {}. Ignoring cleanup of this log", path, ex);
return false;
}
Comment on lines +168 to +188
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is largely the same as what was removed above, with two key differences:

  1. Of course, it will now return archived HMaster WALs as deletable too
  2. Better handling of the possible null from BackupUtils#parseHostNameFromLogFile. It was difficult to reason about this error without a patch in our fork because, from the user's perspective, we were previously just hitting an NPE when constructing an Address

return false;
}

private static boolean isHMasterWAL(Path path) {
String fn = path.getName();
return fn.startsWith(WALProcedureStore.LOG_PREFIX)
|| fn.endsWith(MasterRegionFactory.ARCHIVED_WAL_SUFFIX);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseClassTestRule;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.backup.BackupType;
Expand Down Expand Up @@ -132,4 +134,21 @@ public void testBackupLogCleaner() throws Exception {
conn.close();
}
}

@Test
public void testCleansUpHMasterWal() {
Path path = new Path("/hbase/MasterData/WALs/hmaster,60000,1718808578163");
assertTrue(BackupLogCleaner.canDeleteFile(Collections.emptyMap(), path));
}

@Test
public void testCleansUpArchivedHMasterWal() {
Path normalPath =
new Path("/hbase/oldWALs/hmaster%2C60000%2C1716224062663.1716247552189$masterlocalwal$");
assertTrue(BackupLogCleaner.canDeleteFile(Collections.emptyMap(), normalPath));

Path masterPath = new Path(
"/hbase/MasterData/oldWALs/hmaster%2C60000%2C1716224062663.1716247552189$masterlocalwal$");
assertTrue(BackupLogCleaner.canDeleteFile(Collections.emptyMap(), masterPath));
Comment on lines +146 to +152
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is necessary to check both paths for backwards compatibility — I'm not totally sure when the directory changed, but the book tells me that it was likely around 2.3. Anyway, there's no additional business logic necessary to flag both paths, but I thought it was worth explicitly testing both

}
}