LocalStore #1032

janos · 2018-11-29T14:40:13Z

Implementation of localstore using shed package, based on https://hackmd.io/ffBjSu8RTyKikvRO7bYrzA document.

swarm/storage/localstore/accessor.go

swarm/storage/localstore/accessor_test.go

zelig · 2018-11-30T05:49:00Z

swarm/storage/localstore/localstore.go

+	if err != nil {
+		return nil, err
+	}
+	db.retrievalIndex, err = db.shed.NewIndex("Hash->StoredTimestamp|AccessTimestamp|Data", shed.IndexFuncs{


I think disk IO will suffer from this if we write 4k chunks every time we update access time.

Shall we just have a separate hash->AccessTimestamp index until we have a smarter way?

We can measure both cases, and also the third case that you thought of, with more complex index structure where two or more key/value pairs are created with prefixes to optimize seeks and writes. I still think that it is not hard to create such index type.

swarm/storage/localstore/localstore.go

swarm/storage/localstore/mode.go

swarm/storage/localstore/localstore.go

swarm/storage/localstore/mode.go

zelig

Looks very promising 👍

swarm/storage/localstore/localstore.go

janos · 2018-12-07T08:47:24Z

Thanks @zelig, comments are updated.

nonsense · 2019-01-22T09:39:05Z

swarm/storage/localstore/localstore.go

+	// is provided to the function.
+	ErrInvalidMode = errors.New("invalid mode")
+	// ErrDBClosed is returned when database is closed.
+	ErrDBClosed = errors.New("db closed")


I think this is not used.

nonsense · 2019-01-22T09:39:34Z

swarm/storage/localstore/gc_test.go

+	t.Run("gc uncounted hashes index count", newItemsCountTest(db.gcUncountedHashesIndex, 0))
+}
+
+func testStoredGCSize(t *testing.T, db *DB, want uint64) {


This helper appears to not be used.

nonsense · 2019-01-22T09:40:39Z

@janos I haven't reviewed this fully, but I think it would be good to have some instrumentation in this code, so that we can monitor it's behaviour when we deploy. Right now it will be very difficult to understand if this is working as expected if we merge.

janos · 2019-01-22T09:54:06Z

@nonsense I am not sure if you were on a meeting when I did a code walkthrough, I would be glad to do it again, to explain details.

This code is not integrated into swarm, current localstore is still used, so I tried to create tests that would provide some insight into the expected behaviour. If you have any suggestions to make the code more readable, I would be glad to hear. This code is based on this document https://hackmd.io/ffBjSu8RTyKikvRO7bYrzA.

nonsense · 2019-01-22T10:15:12Z

@janos yes, all clear ;) I see you are not updating any existing lines of code, so I figured this is not integrated. Still I think my comment applies - tests and benchmarks are great, but we also want to have visibility when we run with production work loads, so we should add some logging and metrics.

janos · 2019-01-22T10:27:19Z

@nonsense Yes, logging and metrics at runtime is missing. I just did not think of them at this stage of development. They certainly must be added before the code is used in production. We can see what are the most critical parts add logging and metrics there.

holisticode

I asked this at first (high level) walk through, don't remember answer. This is not integrated with the stream package yet. Will that be part of the same PR? Then it will become a very big one. Nevertheless, I would oppose NOT migrating the stream package right away along these changes, because otherwise we don't have a clear idea of how and how well it works together (smoke and other tests)
How is localstore/db how to insert items in GC index #1031 related? I can't see a clear decision has been taken on that ticket; which Suggestion has been implemented here? Or will it be implemented in the future? How will it change this PR?
How does this implementation of GC work with insured chunks in the future? Will it have to change again? Will it just not be added to the GC index?
How could different GC algorithms plugged in resp. adaptations be made to GC in the future? What if we need very flexible solutions due to crypto-economic requirements?

holisticode · 2019-01-28T14:46:44Z

swarm/storage/localstore/doc.go

+
+Getters, Putters and Setters accept different get, put and set modes
+to perform different actions. For example, ModeGet has two different
+variables ModeGetRequest and ModeGetSync and dwo different Getters


holisticode · 2019-01-28T14:48:22Z

swarm/storage/localstore/gc.go

+	// in range (0,1]. For example, with 0.9 value,
+	// garbage collection will leave 90% of defined capacity
+	// in database after its run. This prevents frequent
+	// garbage collection runt.


s/runt/runs

swarm/storage/localstore/gc.go

holisticode · 2019-01-28T22:00:05Z

swarm/storage/localstore/gc.go

+		}
+
+		gcSize := db.getGCSize()
+		if gcSize-collectedCount <= target {


Is there not the danger of the following here:

We reach GC target

GC kicks in

Global Lock On

Run GC

In The meanwhile new syncing happens, new chunks arrive and wait

GC terminates

New chunks from above now have to be written again, then GC runs again - If the amount of new chunks exceeds the target, GC runs again....

"Loop" repeats

What happens if we reach target while new chunks wait for write, is the write interrupted so that GC runs? What if DB is at 90%, then new chunks arrive with exceeding capacity, (say 105%), will write be interrupted?

I assume that you are referring only to a global lock.

You can see that every chunk is saved in its own batch, so this can not happen, as gc should kick in. There should be one difference with global lock. The gc size counting should be much simpler and included in the chunk write batch. But this is only for the global lock, which is here only for BenchmarkPutUpload, until we decide whether we want to use it, or stick with address lock.

holisticode · 2019-01-29T01:06:27Z

swarm/storage/localstore/localstore.go

+
+	// schema name of loaded data
+	schemaName shed.StringField
+	// filed that stores number of intems in gc index


file that stores number of items?

Ups, that is a bad typo, should be field.

holisticode · 2019-01-29T01:22:50Z

swarm/storage/localstore/mode_get.go

+	return storage.NewChunk(out.Address, out.Data), nil
+}
+
+// get returns Item with from the retrieval index


get returns Item with...from?

'With' is the word that is not needed.

holisticode · 2019-01-29T01:24:20Z

swarm/storage/localstore/mode_get.go

+		return err
+	}
+	if item.AccessTimestamp == 0 {
+		// chunk is not yes synced


holisticode · 2019-01-29T01:26:25Z

swarm/storage/localstore/mode_put.go

+const (
+	// ModePutRequest: when a chunk is received as a result of retrieve request and delivery, it is put only in
+	ModePutRequest ModePut = iota
+	// ModePutSync: when a chunk is received via syncing in it is put in


"in it is put in" sounds strange

Thanks, this comments should be corrected.

holisticode · 2019-01-29T01:30:05Z

swarm/storage/localstore/mode_put.go

+		}
+		if item.AccessTimestamp != 0 {
+			// delete current entry from the gc index
+			db.gcIndex.DeleteInBatch(batch, item)


looks like GC logic is dispersed over different places...

Any ideas how to organize it better? My current one is to use global lock, which would simplify gc counting. But, this part of the code that you are referring, just needs to remove the old item in gc index and put the new one with new access timestamp. Any ideas how to simplify it?

janos · 2019-01-29T11:09:59Z

Thanks @holisticode for a very detailed code review. It opens some nice questions regarding the complexity and many parts of the code that are still too convoluted and not clear. I ask for a discussion about the global lock based on BenchmarkPutUpload results and impact on performance and code clarity.

I asked this at first (high level) walk through, don't remember answer. This is not integrated with the stream package yet. Will that be part of the same PR? Then it will become a very big one. Nevertheless, I would oppose NOT migrating the stream package right away along these changes, because otherwise we don't have a clear idea of how and how well it works together (smoke and other tests)

This PR should contain only additions in the localstore package. Is under a review because it is getting large and integration to stream would make it even larger.

How is localstore/db how to insert items in GC index #1031 related? I can't see a clear decision has been taken on that ticket; which Suggestion has been implemented here? Or will it be implemented in the future? How will it change this PR?

Issue #1031 is related but still under discussion, while the gc implementation in new localstore is functional with algorithm similar to the one in current ldbstore. It may be better to have a separate PR for new gc implementation only. This PR should not contain work based on #1031. Would you suggest a different approach?

How does this implementation of GC work with insured chunks in the future? Will it have to change again? Will it just not be added to the GC index?

This are great questions but I think that they are better asked in #1031. Could you raise your concerns there?

How could different GC algorithms plugged in resp. adaptations be made to GC in the future? What if we need very flexible solutions due to crypto-economic requirements?

We should implement that. Do you have ideas how we can have better storage foundations for that? I tried to make localstore with indexes as abstractions for this reason. I would like to hear ideas that it can be improved with different generalizations and what would be the tradeoffs in terms of performance.

janos · 2019-02-05T12:53:07Z

@holisticode I tried to elaborate the reasoning behind garbage collection index size counting in this comment 40432d9. Is this addresses some complexities that you raised in comments? Would you write it differently, or maybe change the implementation?

janos · 2019-02-07T16:03:39Z

Submitted usptream ethereum/go-ethereum#19015.

swarm/storage/localstore: most basic database

d8acb12

janos added in progress localstore area:db-rewrite labels Nov 29, 2018

zelig reviewed Nov 30, 2018

View reviewed changes

janos added 18 commits December 3, 2018 10:24

swarm/storage/localstore: fix typos and comments

9ec535a

swarm/shed: add uint64 field Dec and DecInBatch methods

37205de

swarm/storage/localstore: decrement size counter on ModeRemoval update

b1ded5a

swarm/storage/localstore: unexport modeAccess and modeRemoval

572f3cb

swarm/storage/localstore: add WithRetrievalCompositeIndex

cbb510b

swarm/storage/localstore: add TestModeSyncing

c7beb22

swarm/storage/localstore: fix test name

391faa7

swarm/storage/localstore: add TestModeUpload

58f3f86

swarm/storage/localstore: add TestModeRequest

2d928bf

swarm/storage/localstore: add TestModeSynced

4d58a6f

swarm/storage/localstore: add TestModeAccess

af1b137

swarm/storage/localstore: add TestModeRemoval

96409ff

swarm/storage/localstore: add mock store option for chunk data

b782bfe

swarm/storage/localstore: add TestDB_pullIndex

35376d8

swarm/storage/localstore: add TestDB_gcIndex

e6a7196

swarm/storage/localstore: change how batches are written

58c7f11

swarm/storage/localstore: add updateOnAccess function

6e8b2ad

swarm/storage/localhost: add DB.gcSize

7b8510e

zelig reviewed Dec 7, 2018

View reviewed changes

swarm/storage/localstore/localstore.go Outdated Show resolved Hide resolved

swarm/storage/localstore/localstore.go Outdated Show resolved Hide resolved

swarm/storage/localstore/localstore.go Outdated Show resolved Hide resolved

swarm/storage/localstore: update comments

cf3ec30

janos added 3 commits December 7, 2018 14:05

swarm/storage/localstore: add BenchmarkNew

d58e1ee

swarm/storage/localstore: add retrieval tests benchmarks

f2299f4

swarm/storage/localstore: accessors redesign

e6bdda7

janos force-pushed the localstore branch from ef32d73 to e6bdda7 Compare December 13, 2018 09:26

janos added 5 commits January 14, 2019 11:22

swarm/storage/localstore: protect slices in push subs test

87bbd61

swarm/storage/localstore: protect chunks in TestModePutUpload_parallel

6ad67d7

swarm/storage/localstore: fix a race in TestDB_updateGCSem defers

a550388

swarm/storage/localstore: remove parallel flag from tests

1dae999

swarm/storage/localstore: fix a race in testDB_collectGarbageWorker

eda338a

zelig approved these changes Jan 14, 2019

View reviewed changes

nonsense reviewed Jan 22, 2019

View reviewed changes

swarm/storage/localstore: remove unused code

ad5b329

janos added 4 commits January 25, 2019 17:38

swarm/storage/localstore: add more context to pull sub log messages

8d15e82

swarm/storage/localstore: merge branch 'master' into localstore

3948044

swarm/storage/localstore: BenchmarkPutUpload and global lock option

6c8208a

swarm/storage/localstore: pre-generate chunks in BenchmarkPutUpload

6accc6b

nonsense approved these changes Jan 28, 2019

View reviewed changes

swarm/storage/localstore: correct useGlobalLock in collectGarbage

85cd349

holisticode reviewed Jan 29, 2019

View reviewed changes

janos added 4 commits January 29, 2019 12:10

swarm/storage/localstore: fix typos and update comments

7fa1ba9

swarm/storage/localstore: update writeGCSize comment

ebecd05

swarm/storage/localstore: remove global lock option

f056e86

swarm/storage/localstore: add description for gc size counting

40432d9

janos mentioned this pull request Feb 7, 2019

swarm/storage/localstore: new localstore package ethereum/go-ethereum#19015

Merged

janos closed this Feb 7, 2019

acud deleted the localstore branch June 3, 2019 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalStore #1032

LocalStore #1032

janos commented Nov 29, 2018 •

edited

Loading

zelig Nov 30, 2018

janos Dec 3, 2018

zelig left a comment

janos commented Dec 7, 2018

nonsense Jan 22, 2019

nonsense Jan 22, 2019

nonsense commented Jan 22, 2019

janos commented Jan 22, 2019

nonsense commented Jan 22, 2019

janos commented Jan 22, 2019

holisticode left a comment

holisticode Jan 28, 2019

holisticode Jan 28, 2019

holisticode Jan 28, 2019

janos Jan 29, 2019

holisticode Jan 29, 2019

janos Jan 29, 2019

holisticode Jan 29, 2019

janos Jan 29, 2019

holisticode Jan 29, 2019

holisticode Jan 29, 2019

janos Jan 29, 2019

holisticode Jan 29, 2019

janos Jan 29, 2019

janos commented Jan 29, 2019

janos commented Feb 5, 2019

janos commented Feb 7, 2019

LocalStore #1032

LocalStore #1032

Conversation

janos commented Nov 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

janos commented Dec 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nonsense commented Jan 22, 2019

janos commented Jan 22, 2019

nonsense commented Jan 22, 2019

janos commented Jan 22, 2019

holisticode left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janos commented Jan 29, 2019

janos commented Feb 5, 2019

janos commented Feb 7, 2019

janos commented Nov 29, 2018 •

edited

Loading