core, eth/filters, miner, xeth: Optimised log filtering #1899

obscuren · 2015-10-12T20:20:33Z

Log filtering is now using a MIPmap like approach where addresses of
logs are added to a mapped bloom bin. The current levels for the MIP are
in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are
therefor filtered in batches of 1.000.

Closes #1895

robotally · 2015-10-12T20:20:34Z

Vote	Count	Reviewers
👍	1	@karalabe
👎	0

Updated: Fri Oct 16 16:18:49 UTC 2015

obscuren · 2015-10-12T20:31:15Z

This PR still requires a database update strategy to add all receipt log addresses to the mip mapped blooms.

codecov-io · 2015-10-12T20:31:59Z

Current coverage is `48.00%`

Merging #1899 into develop will increase coverage by +0.20% as of 4de1469

Powered by Codecov. Updated on successful CI builds.

obscuren · 2015-10-12T20:33:13Z

benchmark              old ns/op      new ns/op     delta
BenchmarkMipmaps-4     3786039581     102561295     -97.29%

obscuren · 2015-10-12T21:59:10Z

@karalabe the full-node PR is going to need some changes for this because we should still be able to search through past receipts. Wherever in the code receipts are being added it's going to require an additional core.WriteMipmapBloom(db, blockNumber, receipts) this so that receipts can be included to the mip mapped bloom bins.

karalabe · 2015-10-13T06:56:54Z

@obscuren Sure, just ping me after this gets merged, or if the other way around, I'll tell you where to insert this (i.e. BlockChain.InsertReceiptChain).

Btw, just a question with assigning receipts to block numbers (haven't looked at your code yet). I see you assign it to block numbers. Can you handle cases where there are multiple blocks/receipts with the same number? Currently in the database everything (apart from the canonical mapping) is mapped to hashes, so scenarios like this don't occur. Just trying to make sure this doesn't go misinterpreted :)

obscuren · 2015-10-13T07:43:59Z

@karalabe nothing gets assigned to anything actually. There are general bloom bins for several levels (0-1m-1, 0-500k-1, 500k-1m-1, 0-100k-1, etc). The keys for these are generated as N / L • L where L is the level in which you are storing.

All of this cares nothing for hashes. There are potentially higher rates of false positives due to chain re-orgs but I think that's quite alright.

frozeman · 2015-10-13T08:50:07Z

I just tried your branch, it doesn't give me the past logs, even though it returns fast.
Does it map also previous logs, or do i need to re-sync my whole chain to make it work?

obscuren · 2015-10-13T09:01:02Z

@frozeman yes full resynx until I get the database merge feature in (see first comment)

obscuren · 2015-10-13T11:13:45Z

Added upgrade strategy and tests for upgrading

karalabe · 2015-10-13T15:30:57Z

eth/filters/filter_test.go

+}
+
+func TestFilters(t *testing.T) {
+	const dbname = "/tmp/mipmap"


This is a bit dangerous as it can overwrite stuff, may use leftovers from previous runs, on windows this will not be interpreted well, etc. A better solution would be to let Go create a temp folder for it (dir, err := ioutil.TempDir("", "mipmap")) and also make sure it's cleaned up afterwards (defer os.RemoveAll(dir)).

Ah yes, that was only temporary. Good catch, well change that to TempDir

obscuren · 2015-10-13T16:49:01Z

@karalabe PTAL changed the TempDir

karalabe · 2015-10-14T07:42:44Z

core/blockchain.go

@@ -666,6 +666,8 @@ func (self *BlockChain) InsertChain(chain types.Blocks) (int, error) {
 			PutTransactions(self.chainDb, block, block.Transactions())
 			// store the receipts
 			PutReceipts(self.chainDb, receipts)
+			// Write map map bloom filters
+			WriteMipmapBloom(self.chainDb, block.NumberU64(), receipts)


I'm not sure this is correct here. This branch is only accessed if you insert a block that is the new head of a canonical chain. If the block you just imported is currently a side chain, then you'll never write the bloom filter. For the above two PutTransactions and PutReceipts this is not a problem because the reorg takes care of invoking these ops when blocks get reorged. I think the correct place to put this is a bit up, right after PutBlockReceipts (https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L649). That would ensure that irrelevant of the "canonical status" of the new block, the bloom filters will be updated.

While I agree it would yield better results it's still not right. I'll give it some thought

it should happen during canonical insertion only (like the others).

We need a helper method that does a few things:

write canon hash

write block receipts

write mip

If you put it here, you can lose data.

Eg: You have an existing empty chain with no receipts: A-B-C-D, and a fork comes in rooted at B: A-B-1-X-Y. The 1 contains a receipt, none of the others do. When you insert 1, this bloom write doesn't get called, since 1 is only a side chain at that point. Then you insert X and Y, your chain gets reorged, but the bloom is never updated any more. So now you've lost knowledge about the receipt in 1.

Hmm... github only showed your first response, but not the second... Anyway, that solution would also work, but you either have to always write out the blooms, or link it to canonical changes to not lose data.

FYI canonical insertion would also happen during chain reorg

You mean canonical number mapping, right? If yes, the yes, that's also a correct place to pt this, but in its current position here, it's wrong. The least it should also be added here https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L759, if you want to go down the route of only marking upon canonical status.

ethernomad · 2015-10-16T08:05:04Z

@obscuren Gav said that cpp-ethereum already has indexing for searching logs, but in #1895 you say this would take up too much disk space. Maybe there is just confusion over definition of "indexing"

karalabe · 2015-10-16T08:34:21Z

Indexing is a fairly expensive and complex operation. You need to maintain a separate "index" database (or dataset at least), which you need to keep in sync with block insertions, chain reorganisations, etc. I don't know exactly what the performance or storage hit would be, but probably non negligible.

However, if you think about it, all indexing would need to do is create a presence list. Is something present or not at a certain block. A full fledged index is much more powerful that that (hence the associated costs). This PR on the other hand uses a different (and imho quite a brilliant idea), of creating bloom filters at various resolutions, that simply state whether a given log has or has not a chance of being present at some section of the chain (e.g. from 100K to 110K). The bloom filters can provide false positives, but will never produce false negatives (i.e. if it says a log isn't present in the interval, no way can it be present). So you can quickly zoom in on sections of the chain that feature a certain log, and then iterate the blocks to find it (which are few by count).

PS: I've no idea how it's implemented in cpp.

obscuren · 2015-10-16T08:50:03Z

@drupalnomad I've no idea how C++ is doing it, in fact I wasn't even aware C++ had something like indexing (or some other technique).

ethernomad · 2015-10-16T08:57:27Z

Gav said this:

however for cpp, at least, there's no reason we can't apply the same scalability to lightclient logic and provide very fast topic searches over millions of blocks
the topic limit is actually 4 (3 indexed parameters in solidity + the indexed event name)

ethernomad · 2015-10-16T08:58:46Z

Thanks for the info @karalabe

karalabe · 2015-10-16T08:59:06Z

core/chain_util.go

+// parameters. For available levels see MIPMapLevels.
+func GetMipmapBloom(db ethdb.Database, number, level uint64) types.Bloom {
+	bloomDat, _ := db.Get(mipmapKey(number, level))
+	return types.BytesToBloom(bloomDat)


What happens here if the bloom filter cannot be found?

Panic

Creates an always negative filter

Creates an always positive filter

If it can't be found it creates an empty bloom filter (all negative)

obscuren · 2015-10-16T09:02:39Z

@drupalnomad Ah, the indexed is a keyword in solidity. The indexed keywords indicates the compiler that the argument is something that is to be "searchable" through filtering; e.g. event T(i indexed uint) and when called, internally in the VM, calls LOG1(value). This value then becomes filterable.

If we had an event T(i uint) the i argument won't be filterable but instead is chucked in the extra data field.

Logs are triplets in the form of [address, [topics], data]. All indexed parameters are considered topics with a maximum of 4.

karalabe · 2015-10-16T09:05:25Z

eth/filters/filter.go

-func (self *Filter) SetLatestBlock(latest int64) {
-	self.latest = latest
+func (self *Filter) SetEndBlock(end int64) {
+	self.end = end
 }

 func (self *Filter) SetAddress(addr []common.Address) {


Might as well rename this to plural too.

karalabe · 2015-10-16T14:49:16Z

core/chain_util.go

+
+// returns a formatted MIP mapped key by adding prefix, canonical number and level
+//
+// ex. fn(98, 1000) = (prefix || 1000 || 1000)


Isn't that last 1000 supposed to be a 0? :P

karalabe · 2015-10-16T16:18:49Z

👍 LGTM now, but note, I never used logs, so I do not know how to properly live test this. @frozeman Please check out this current version and play with it and report any issues.

Log filtering is now using a MIPmap like approach where addresses of logs are added to a mapped bloom bin. The current levels for the MIP are in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are therefor filtered in batches of 1.000.

core, eth/filters, miner, xeth: Optimised log filtering

* cmd: add tests for init-network command * cmd: add setup function

obscuren added type:feature review labels Oct 12, 2015

obscuren added this to the 1.3.0 milestone Oct 12, 2015

obscuren force-pushed the mipmap-bloom branch from 1629d3b to 5b5be89 Compare October 12, 2015 20:24

obscuren added in progress and removed review labels Oct 12, 2015

obscuren mentioned this pull request Oct 12, 2015

eth_getLogs needs to perform an indexed search #1895

Closed

obscuren force-pushed the mipmap-bloom branch from 5b5be89 to 78c68c6 Compare October 13, 2015 11:13

obscuren added review and removed in progress labels Oct 13, 2015

karalabe reviewed Oct 13, 2015
View reviewed changes

obscuren force-pushed the mipmap-bloom branch 2 times, most recently from dc1d78c to b8a0e88 Compare October 13, 2015 16:47

karalabe reviewed Oct 14, 2015
View reviewed changes

obscuren force-pushed the mipmap-bloom branch from 227ddcc to 8c1ee94 Compare October 16, 2015 08:56

karalabe reviewed Oct 16, 2015
View reviewed changes

obscuren force-pushed the mipmap-bloom branch 3 times, most recently from a8a948e to e2e6889 Compare October 16, 2015 10:21

karalabe reviewed Oct 16, 2015
View reviewed changes

obscuren force-pushed the mipmap-bloom branch from e2e6889 to 9df5a5c Compare October 16, 2015 15:22

obscuren force-pushed the mipmap-bloom branch from 9df5a5c to 2aa1a6b Compare October 16, 2015 16:18

obscuren force-pushed the mipmap-bloom branch from 2aa1a6b to 6dc1478 Compare October 16, 2015 19:29

obscuren added a commit that referenced this pull request Oct 16, 2015

Merge pull request #1899 from obscuren/mipmap-bloom

10ed107

core, eth/filters, miner, xeth: Optimised log filtering

obscuren merged commit 10ed107 into ethereum:develop Oct 16, 2015

obscuren removed the review label Oct 16, 2015

obscuren deleted the mipmap-bloom branch October 16, 2015 19:35

obscuren modified the milestones: 2.0.0, 1.3.0 Oct 28, 2015

maoueh pushed a commit to streamingfast/go-ethereum that referenced this pull request Nov 1, 2023

cmd: add tests for init-network (ethereum#1899)

6932673

* cmd: add tests for init-network command * cmd: add setup function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core, eth/filters, miner, xeth: Optimised log filtering #1899

core, eth/filters, miner, xeth: Optimised log filtering #1899

obscuren commented Oct 12, 2015

robotally commented Oct 12, 2015

obscuren commented Oct 12, 2015

codecov-io commented Oct 12, 2015

obscuren commented Oct 12, 2015

obscuren commented Oct 12, 2015

karalabe commented Oct 13, 2015

obscuren commented Oct 13, 2015

frozeman commented Oct 13, 2015

obscuren commented Oct 13, 2015

obscuren commented Oct 13, 2015

karalabe Oct 13, 2015

obscuren Oct 13, 2015

obscuren commented Oct 13, 2015

karalabe Oct 14, 2015

obscuren Oct 14, 2015

obscuren Oct 14, 2015

karalabe Oct 14, 2015

karalabe Oct 14, 2015

obscuren Oct 14, 2015

karalabe Oct 14, 2015

ethernomad commented Oct 16, 2015

karalabe commented Oct 16, 2015

obscuren commented Oct 16, 2015

ethernomad commented Oct 16, 2015

ethernomad commented Oct 16, 2015

karalabe Oct 16, 2015

obscuren Oct 16, 2015

obscuren commented Oct 16, 2015

karalabe Oct 16, 2015

karalabe Oct 16, 2015

obscuren Oct 16, 2015

karalabe commented Oct 16, 2015

core, eth/filters, miner, xeth: Optimised log filtering #1899

core, eth/filters, miner, xeth: Optimised log filtering #1899

Conversation

obscuren commented Oct 12, 2015

robotally commented Oct 12, 2015

obscuren commented Oct 12, 2015

codecov-io commented Oct 12, 2015

Current coverage is 48.00%

obscuren commented Oct 12, 2015

obscuren commented Oct 12, 2015

karalabe commented Oct 13, 2015

obscuren commented Oct 13, 2015

frozeman commented Oct 13, 2015

obscuren commented Oct 13, 2015

obscuren commented Oct 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obscuren commented Oct 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethernomad commented Oct 16, 2015

karalabe commented Oct 16, 2015

obscuren commented Oct 16, 2015

ethernomad commented Oct 16, 2015

ethernomad commented Oct 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obscuren commented Oct 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe commented Oct 16, 2015

Current coverage is `48.00%`