Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic (index out of range) on writeable tx rollback with db.NoFreelistSync #152

Closed
kozlovic opened this issue Mar 11, 2019 · 11 comments · Fixed by containerd/containerd#3519

Comments

@kozlovic
Copy link

If the db is opened with db.NoFreelistSync = true, if a writeable transaction is rolled-back, we get following panic:

=== RUN   TestPanicOnTxTollbackWithNoFreelistSync
--- FAIL: TestPanicOnTxTollbackWithNoFreelistSync (0.05s)
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 52 [running]:
testing.tRunner.func1(0xc0001a4200)
	/usr/local/go/src/testing/testing.go:830 +0x68a
panic(0x1b3d700, 0x2431150)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*DB).page(...)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/db.go:880
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*Tx).rollback(0xc0002000e0)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/tx.go:267 +0x24b
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*Tx).Rollback(0xc0002000e0, 0xc00016e990, 0x1)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/tx.go:257 +0x75
github.com/nats-io/nats-streaming-server/server.TestPanicOnTxTollbackWithNoFreelistSync(0xc0001a4200)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-streaming-server/server/raft_log_test.go:459 +0x5b2
testing.tRunner(0xc0001a4200, 0x1c50670)
	/usr/local/go/src/testing/testing.go:865 +0x164
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:916 +0x69a
exit status 2
FAIL	github.com/nats-io/nats-streaming-server/server	0.080s
Error: Tests failed.

Here is the test to produce such panic:

func TestPanicOnTxTollbackWithNoFreelistSync(t *testing.T) {
	os.Remove("bolt.db")
	defer os.Remove("bolt.db")

	db, err := bolt.Open("bolt.db", 0600, nil)
	if err != nil {
		t.Fatalf("Error opening db: %v", err)
	}
	db.NoFreelistSync = true
	db.FreelistType = bolt.FreelistMapType

	tx, err := db.Begin(true)
	if err != nil {
		t.Fatalf("Error starting tx: %v", err)
	}
	bucket := []byte("mybucket")
	if _, err := tx.CreateBucket(bucket); err != nil {
		t.Fatalf("Error creating bucket: %v", err)
	}
	if err := tx.Commit(); err != nil {
		t.Fatalf("Error on commit: %v", err)
	}

	tx, err = db.Begin(true)
	if err != nil {
		t.Fatalf("Error starting tx: %v", err)
	}
	b := tx.Bucket(bucket)
	if err := b.Put([]byte("k"), []byte("v")); err != nil {
		t.Fatalf("Error on put: %v", err)
	}
	// Imagine there is an error and tx needs to be rolled-back
	if err := tx.Rollback(); err != nil {
		t.Fatalf("Error on rollback: %v", err)
	}

	tx, err = db.Begin(false)
	if err != nil {
		t.Fatalf("Error starting tx: %v", err)
	}
	b = tx.Bucket(bucket)
	if v := b.Get([]byte("k")); v != nil {
		t.Fatalf("Value for k should not have been stored")
	}
	if err := tx.Rollback(); err != nil {
		t.Fatalf("Error on rollback: %v", err)
	}
}

Commenting out db.NoFreelistSync = true solves the issue.

I wonder if you should check for NoFreelistSync here

Something like this instead:

	if tx.writable && !tx.db.NoFreelistSync {
		tx.db.freelist.rollback(tx.meta.txid)
		tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))
	}
	tx.close()
@xiang90
Copy link
Contributor

xiang90 commented Mar 11, 2019

@kozlovic I think you are very close to figure out the root cause. Can you help us to get this fixed?

/cc @WIZARD-CXY

@WIZARD-CXY
Copy link
Contributor

WIZARD-CXY commented Mar 12, 2019

Thanks for reporting this @kozlovic , let me take a closer look at it today.

@WIZARD-CXY
Copy link
Contributor

It is indeed a bug. if we have tx.db.NoFreelistSync=true. then tx.db.meta().freelist= pgidNoFreelist pgid = 0xffffffffffffffff which will cause the tx.db.page() panic for sure.

@WIZARD-CXY
Copy link
Contributor

WIZARD-CXY commented Mar 12, 2019

@kozlovic I think the right fix is follow, WDYT

if tx.writable {
		tx.db.freelist.rollback(tx.meta.txid)
		if !tx.db.hasSyncedFreelist() {
			tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))
		}
	}

rollback must be done no matter sync freelist or not

@kozlovic
Copy link
Author

kozlovic commented Mar 12, 2019

@WIZARD-CXY Yes, expect that it should be:

if tx.db.hasSyncedFreelist() {
    tx.db.freelist.reload(tx.db.page(tx.db.meta().freelist))
}

Notice that I removed the !. You need to reload the free list if it is actually sync'ed on disk :-)

@kozlovic
Copy link
Author

@xiang90 @WIZARD-CXY Any chance you guys will be able to get a release after the fix is in? I need to do a release of https://github.com/nats-io/nats-streaming-server and this is holding me up. If you can't, I will have to remove use of NoFreelistSync for now (https://github.com/nats-io/nats-streaming-server/pull/766/files)

@jingyih
Copy link

jingyih commented Mar 12, 2019

cc @jpbetz

@kozlovic
Copy link
Author

@xiang90 @WIZARD-CXY Just realized that someone reported that back in Nov 2017: #70 ...

@WIZARD-CXY
Copy link
Contributor

@kozlovic my mistake, should remove !, as you proposed.

@WIZARD-CXY
Copy link
Contributor

I will fire up a pr to fix this asap.

@Roasbeef
Copy link

Roasbeef commented Jul 8, 2019

Seems this can be closed now that #153 has been merged.

@xiang90 xiang90 closed this as completed Jul 8, 2019
thaJeztah added a commit to thaJeztah/containerd that referenced this issue Aug 9, 2019
this brings the dependency back to a released version:

- go.etcd.io/bbolt etcd-io/bbolt@2eb7227...v1.3.3
  - etcd-io/bbolt#153 fix rollback panic bug
    - fixes etcd-io/bbolt#152 Panic (index out of range) on writeable tx rollback with db.NoFreelistSync

Signed-off-by: Sebastiaan van Stijn <[email protected]>
tussennet pushed a commit to tussennet/containerd that referenced this issue Sep 11, 2020
this brings the dependency back to a released version:

- go.etcd.io/bbolt etcd-io/bbolt@2eb7227...v1.3.3
  - etcd-io/bbolt#153 fix rollback panic bug
    - fixes etcd-io/bbolt#152 Panic (index out of range) on writeable tx rollback with db.NoFreelistSync

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

5 participants