Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadOnlyBookieTest is flaky #2665

Closed
dlg99 opened this issue Mar 29, 2021 · 2 comments
Closed

ReadOnlyBookieTest is flaky #2665

dlg99 opened this issue Mar 29, 2021 · 2 comments
Labels

Comments

@dlg99
Copy link
Contributor

dlg99 commented Mar 29, 2021

BUG REPORT

Describe the bug

ReadOnlyBookieTest is flaky on current master branch

To Reproduce

mvn clean install -DskipTests
cd bookkeeper-server/
run a few times:
mvn test -Dtest=org.apache.bookkeeper.test.ReadOnlyBookieTest

[INFO] Running org.apache.bookkeeper.test.ReadOnlyBookieTest
[ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.14 s <<< FAILURE! - in org.apache.bookkeeper.test.ReadOnlyBookieTest
[ERROR] org.apache.bookkeeper.test.ReadOnlyBookieTest.testBookieContinueWritingIfMultipleLedgersPresent  Time elapsed: 4.151 s  <<< ERROR!
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
	at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
	at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:251)
	at org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:92)
	at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:431)
	at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1799)
	at org.apache.bookkeeper.client.LedgerHandle$5.safeRun(LedgerHandle.java:574)
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)

[INFO]
[INFO] Results:
[INFO]
[WARNING] Flakes:
[WARNING] org.apache.bookkeeper.test.ReadOnlyBookieTest.testBookieContinueWritingIfMultipleLedgersPresent
[ERROR]   Run 1: ReadOnlyBookieTest.testBookieContinueWritingIfMultipleLedgersPresent » BKNotEnoughBookies
[INFO]   Run 2: PASS
@dlg99 dlg99 added the type/bug label Mar 29, 2021
hangc0276 pushed a commit that referenced this issue Jul 26, 2022
### Motivation

Fixes issue #2665 

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
``` 

### Changes
When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.


Set the disk check interval to the `Integer.MAX_VALUE`, which is stable 
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```
@wenbingshen
Copy link
Member

@hangc0276 Could you help close this issue because related pr #3421 has fixed it and merged. Thanks.

zymap pushed a commit that referenced this issue Aug 2, 2022
### Motivation

Fixes issue #2665

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
```

### Changes
When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.

Set the disk check interval to the `Integer.MAX_VALUE`, which is stable
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```

(cherry picked from commit 1908b7e)
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this issue Nov 5, 2022
…#3421)

Fixes issue apache#2665

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
```

When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.

Set the disk check interval to the `Integer.MAX_VALUE`, which is stable
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```

(cherry picked from commit 1908b7e)
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this issue Nov 7, 2022
…#3421)

Fixes issue apache#2665

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
```

When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.

Set the disk check interval to the `Integer.MAX_VALUE`, which is stable
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```

(cherry picked from commit 1908b7e)
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this issue Jan 11, 2023
…#3421)

Fixes issue apache#2665

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
```

When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.

Set the disk check interval to the `Integer.MAX_VALUE`, which is stable
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```

(cherry picked from commit 1908b7e)
(cherry picked from commit da532fc)
@shoothzj
Copy link
Member

@wenbingshen Thanks, I have closed this issue.

Ghatage pushed a commit to sijie/bookkeeper that referenced this issue Jul 12, 2024
…#3421)

### Motivation

Fixes issue apache#2665 

When the disk is full, `InterleavedLedgerStorage` creates a new entryLog and index file, copies and moves the old index (newFile) `FileInfo` to the new directory, and finally deletes the old index file.

The old file of `FileInfo` was `deleted`, but it was copied and moved a newFile to the new directory. Therefore, the delete status flag should be restored. If it is not set to false,`checkOpen` will throw `FileInfoDeletedException` when judging the fence status of `ledgerHandle` when adding an entry, causing the writing to fail. When the client changes the `ensemble`, it throws an exception because the test has only two bookie nodes and cannot satisfy the two writeSet: `Not enough non-faulty bookies available`.
```java
    public synchronized boolean isFenced() throws IOException {
        checkOpen(false);
        return (stateBits & STATE_FENCED_BIT) == STATE_FENCED_BIT;
    }

    private synchronized void checkOpen(boolean create, boolean openBeforeClose)
            throws IOException {
        if (deleted) {
            throw new FileInfoDeletedException();
        }
      .....
    }
``` 

### Changes
When `FileInfo` moves to a new directory by `moveToNewLocation`,delete the old index file and restore the `delete` flag, because the new index file is ready.


Set the disk check interval to the `Integer.MAX_VALUE`, which is stable 
`assertEquals("writable dirs should have one dir", 1, ledgerDirsManager
                .getWritableLedgerDirs().size());`
Because once the test thrashes, the disk check thread will restore the full disk to a writable state.
```java
newConf.setDiskCheckInterval(Integer.MAX_VALUE);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants