-
Notifications
You must be signed in to change notification settings - Fork 165
tx/address indexing as optional feature, with separate db #475
Conversation
- api: debug_getAddressTransactions(<0xaddress>, <startBlockN>, <endBlockN>, <to|from|>) - cli cmd: 'geth atxi-build [--start=number] [--stop=number]' - cli flag: 'geth [--atxi|--add-tx-index] The api returns an array of 0x-prefixed transactions hashes. Creates a new database <chaindir>/indexes which holds key indexes of the form txa-<common.Address (address)><8-byte block number uint64><t|f (to|from)><common.Hash (tx hash)> Lookups by address use a prefix iterator on address, eg. txa-<my address>... Known issues currently: - Case not handled: delete/reorg blocks. If there is a chain reorg, and a once-canonical block is relegated to side chain, we should remove associated atxi's, which is to say the atx index should only reflect the known canonical chain. - ethdb.Database interface doesn't include iterator, so storage testing with MemDatabase will require changes to Database interface or some other kind of stub - Build the indexes performance might be improved significantly by using a memory cache with Batch writes.
core/blockchain.go
Outdated
return err | ||
} | ||
} | ||
if block.NumberU64()%10000 == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un-hardcore this, use step
parameter.
And maybe introduce a processed block counter, instead of using block numbers? This is only my pedantic suggestion ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea -- you're right. Will do. Just a relic of fast-and-furious sketching.
core/blockchain.go
Outdated
return putBatch.Write() | ||
} | ||
|
||
func WriteBlockAddTxIndexes(indexDb ethdb.Database, block *types.Block) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be batched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we may extract a function that adds all transactions from block to a batch, and then use it here, and inside the loop in AddTxIndexesBatch
above.
func WriteBlockAddTxIndexes(indexDb ethdb.Database, block *types.Block) error {
putBatch := indexDb.NewBatch()
addTransactions(block, putBatch)
return putBatch.Write()
}
func (self *BlockChain) AddTxIndexesBatch(indexDb ethdb.Database, startBlockN, stopBlockN uint64) (err error) {
...
for block != nil && block.NumberU64() <= stopBlockN {
addTransactions(block, putBatch)
// write each "step"...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
core/database_util.go
Outdated
} | ||
|
||
// GetAddrTxs gets the indexed transactions for a given account address. | ||
func GetAddrTxs(db ethdb.Database, address common.Address, blockStartN uint64, blockEndN uint64, toFromOrBoth string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename toFromOrBoth
to direction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like it. Will do.
core/database_util.go
Outdated
} | ||
} | ||
if blockEndN > 0 { | ||
txaI := new(big.Int).SetUint64(binary.LittleEndian.Uint64(blockNum)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use plain uint64
s and compare directly with blockStartN
and blockEndN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️
core/database_util.go
Outdated
continue | ||
} | ||
} | ||
if toFromOrBoth == "to" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified a bit: before the loop, map "to" -> 't', "from" -> 'f', ""/"both" -> 'b'
var direction byte = 'b'
if len(toFromOrBoth) > 0 {
direction = toFromOrBoth[0]
}
and then just do:
if direction != 'b' && direction != torf[0] {
continue
}
Gopkg.lock
Outdated
@@ -13,6 +13,12 @@ | |||
revision = "2f1ce7a837dcb8da3ec595b1dac9d0632f0f99e8" | |||
version = "v1.3.1" | |||
|
|||
[[projects]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see the usages of newly vendored packages, and many seems very unrelated to this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea -- I used a progress bar for a while for the atxi-build
command, then removed it and haven't pushed the commit removed the deps (oops)
cmd/geth/build_atxi_cmd.go
Outdated
|
||
indexDb := MakeIndexDatabase(ctx) | ||
if indexDb == nil { | ||
panic("indexdb is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you use glog.Fatal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
core/database_util.go
Outdated
} | ||
|
||
// resolveAddrTxBytes resolves the index key to individual []byte values | ||
func resolveAddrTxBytes(key []byte) (address, blockNumber, toOrFrom, txhash []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first returned value is used only in tests ;) But from logical point of view, returning everything seems right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I noticed that too, but agree that it seems right to have it. And it's worth it if only for tests.
makes compatible with --fast
- use step param for batch atxi fn - use blockprocessed counter for batch atxi fn instead of block number - use glog.Fatal instead of panic
It seems kind of awkward to put the api in |
Also, what do you think about adding another parameter implemented in a7dbbb5 |
g bug was with the address field for 'to' indexes being messed up cuz unnecessary pointer
solution: use key instead of tx hash (typo)
solution: rename keep -> diff because it returns the difference, see usage in bv.reorg
... with shared but postponed txs (same tx, later block) solution: rm all txs from old chain, add all txs for new chain in case of reorg. atxi should only reflect canonical chain.
solution: write 'em
core/database_util.go
Outdated
if err != nil || v == nil { | ||
return 0 | ||
} | ||
s := string(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no big deal, but could use binary package Uint64 instead
solution: implement that nonexclusively cc @pyskell
Please re-format files, as there are some inconsistent parts. Reviewed 1 of 135 files at r1, 1 of 127 files at r3, 1 of 8 files at r6, 2 of 6 files at r7, 6 of 9 files at r8, 4 of 4 files at r9. cmd/geth/build_atxi_cmd.go, line 3 at r9 (raw file):
Formatting required. cmd/geth/build_atxi_cmd.go, line 16 at r9 (raw file):
I don't like this hardcode. Do you think that introducing separate commandline parameter for indexing cache is reasonable? cmd/geth/flag.go, line 818 at r9 (raw file):
I need clarification - you are using same cache size parameter for both databases, but set 'cache ratio' to 0.5 for both of them, to keep the global memory consumption within limits? common/types.go, line 128 at r9 (raw file):
Is this function necessary? It seems that there is only one usage, just below. core/blockchain.go, line 798 at r9 (raw file):
I don't like the core/database_util.go, line 220 at r7 (raw file): Previously, whilei (ia) wrote…
SGTM core/database_util.go, line 156 at r9 (raw file):
Maybe it's the premature optimization, but we can pre-allocate slice ( core/database_util.go, line 217 at r9 (raw file):
Is it possible, that to eth/backend.go, line 231 at r9 (raw file):
Those arbitrary numbers are different in different places of the code. Comments from Reviewable |
Ah, now I understand why you split cache differently. For building index, it's big (for performance), and for normal "appending" it's small, because you don't want to waste memory that can be used by chain data? Am I right? |
I have one more idea (definitely for another PR) - parallelize! |
cmd/geth/build_atxi_cmd.go, line 16 at r9 (raw file): Previously, tzdybal (Tomasz Zdybał) wrote…
It think it may be reasonable, though hesitant because I think it may also be overkill, since global Open to further configuration/un-hardcoding, but would suggest to delegate to follow-up PR as a nice-to-have. Comments from Reviewable |
cmd/geth/flag.go, line 818 at r9 (raw file): Previously, tzdybal (Tomasz Zdybał) wrote…
Yes -- just to keep the system adherent to global Comments from Reviewable |
cmd/geth/flag.go, line 818 at r9 (raw file): Previously, whilei (ia) wrote…
And the 50/50 ratio is arbitrary and based only on my own empirical experiments. Comments from Reviewable |
solution: refactor to use common addr == zero-value fn
solution: removeRemovals -> deleteRemovalsFn
solution: implement pagination and reverse params for atxi api This seems like a useful thing for most applications... - reverse because most relevant transactions are latest txs - pagination because some account have tens- or hundreds- of thousands of txs
Reviewed 42 of 42 files at r10. cmd/geth/build_atxi_cmd.go, line 16 at r9 (raw file): Previously, whilei (ia) wrote…
This is quite complex - leave it as is for now.
core/blockchain.go, line 798 at r9 (raw file): Previously, tzdybal (Tomasz Zdybał) wrote…
Actually I didn't like the conflict between Comments from Reviewable |
solution: implement sort.Sort interface for custom sortable struct
I like the pagination
Reviewed 6 of 6 files at r11. Comments from Reviewable |
solution: sort.Sort interface Less method should use be backwards since we want higher block numbers first
@tzdybal One last open question -- do we want to stuck with Or maybe establish a new |
This creates a new public RPC/JS API module 'geth_'.
geth_getAddressTransactions
Returns transactions for an address.
Usage requires address-transaction indexes using
geth --atxi
to enable and create indexes during chain sync/import, optionally usinggeth atxi-build
to index pre-existing chain data.Parameters
DATA
, 20 Bytes - address to check for transactionsQUANTITY
- integer block number to filter transactions floorQUANTITY
- integer block number to filter transactions ceilingSTRING
-[t|f|tf|]
, uset
for transactions to the address,f
for from, ortf
/''
for bothSTRING
-[s|c|sc|]
, uses
for standard transactions,c
for contracts, orsc
/``''` for bothQUANTITY
- integer of index to begin pagination. Using-1
equivalent to0
.QUANTITY
- integer of index to end pagination. Using-1
equivalent to last transaction n.BOOL
- whether to return transactions in order of oldest first. By defaultfalse
returns transaction hashes ordered by newest transactions first.Returns
Array
- Array of transaction hashes, or an empty array if no transactions foundExample
geth atxi-build
Builds address-transactions indexes for existing chaindata. The command is idempotent.
Running a complete chain index from block 0 -> 5220000 on a 2009 Macbook Pro with 1GB cache took 3h30m, averaging ~300 blocks/second and ~1800 txs/second, and the
/indexes
database is 2.0GB.Parameters
--start=QUANTITY
- custom floor at which to begin indexing. If unset, the build command will use it's persistent placeholder if the command has been run before.--stop=QUANTITY
- custom ceiling at which to finish indexing. If unset, default value is blockchain head.--step=QUANTITY
- custom increment for batching writes to db and setting persistent progress placeholder. Default value is 10000.geth --atxi
Flag required to enable address-transaction indexing during sync and import, and to enable associated API for a geth instance.
Implementation
Creates a new database
<chaindir>/indexes
which holds key indexes of the form below, where each unit is a definite length.txa-
= 4 bytesaddress
= 20 bytesblockNumber uint64
= 8 bytesdirection
= 1 bytekindof
= 1 bytetxhash
= 32 bytesThe key index can then be resolved to individual values using these known lengths, eg. address =
key[4:24]
Lookups by address use a prefix iterator on address.
[WIP] - Known issues currently:
Case not handled: delete/reorg blocks. If there is a chain reorg, and a once-canonical
block is relegated to side chain, we should remove associated atxi's, which is to
say the atx index should only reflect the known canonical chain.
ethdb.Database interface doesn't include iterator, so storage testing with MemDatabase
will require changes to Database interface or some other kind of stub. Use level db in tmp dir
Build the indexes performance might be improved significantly by using a memory cache with Batch writes.
Does not use a bloom filter.
Undecided.Unnecessary.