Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, eth/filters, miner, xeth: Optimised log filtering #1899

Merged
merged 2 commits into from
Oct 16, 2015

Conversation

obscuren
Copy link
Contributor

Log filtering is now using a MIPmap like approach where addresses of
logs are added to a mapped bloom bin. The current levels for the MIP are
in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are
therefor filtered in batches of 1.000.

Closes #1895

@obscuren obscuren added this to the 1.3.0 milestone Oct 12, 2015
@robotally
Copy link

Vote Count Reviewers
👍 1 @karalabe
👎 0

Updated: Fri Oct 16 16:18:49 UTC 2015

@obscuren
Copy link
Contributor Author

This PR still requires a database update strategy to add all receipt log addresses to the mip mapped blooms.

@codecov-io
Copy link

Current coverage is 48.00%

Merging #1899 into develop will increase coverage by +0.20% as of 4de1469

Powered by Codecov. Updated on successful CI builds.

@obscuren
Copy link
Contributor Author

benchmark              old ns/op      new ns/op     delta
BenchmarkMipmaps-4     3786039581     102561295     -97.29%

@obscuren
Copy link
Contributor Author

@karalabe the full-node PR is going to need some changes for this because we should still be able to search through past receipts. Wherever in the code receipts are being added it's going to require an additional core.WriteMipmapBloom(db, blockNumber, receipts) this so that receipts can be included to the mip mapped bloom bins.

@karalabe
Copy link
Member

@obscuren Sure, just ping me after this gets merged, or if the other way around, I'll tell you where to insert this (i.e. BlockChain.InsertReceiptChain).

Btw, just a question with assigning receipts to block numbers (haven't looked at your code yet). I see you assign it to block numbers. Can you handle cases where there are multiple blocks/receipts with the same number? Currently in the database everything (apart from the canonical mapping) is mapped to hashes, so scenarios like this don't occur. Just trying to make sure this doesn't go misinterpreted :)

@obscuren
Copy link
Contributor Author

@karalabe nothing gets assigned to anything actually. There are general bloom bins for several levels (0-1m-1, 0-500k-1, 500k-1m-1, 0-100k-1, etc). The keys for these are generated as N / L • L where L is the level in which you are storing.

All of this cares nothing for hashes. There are potentially higher rates of false positives due to chain re-orgs but I think that's quite alright.

@frozeman
Copy link
Contributor

I just tried your branch, it doesn't give me the past logs, even though it returns fast.
Does it map also previous logs, or do i need to re-sync my whole chain to make it work?

@obscuren
Copy link
Contributor Author

@frozeman yes full resynx until I get the database merge feature in (see first comment)

@obscuren
Copy link
Contributor Author

Added upgrade strategy and tests for upgrading

}

func TestFilters(t *testing.T) {
const dbname = "/tmp/mipmap"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit dangerous as it can overwrite stuff, may use leftovers from previous runs, on windows this will not be interpreted well, etc. A better solution would be to let Go create a temp folder for it (dir, err := ioutil.TempDir("", "mipmap")) and also make sure it's cleaned up afterwards (defer os.RemoveAll(dir)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that was only temporary. Good catch, well change that to TempDir

@obscuren obscuren force-pushed the mipmap-bloom branch 2 times, most recently from dc1d78c to b8a0e88 Compare October 13, 2015 16:47
@obscuren
Copy link
Contributor Author

@karalabe PTAL changed the TempDir

@@ -666,6 +666,8 @@ func (self *BlockChain) InsertChain(chain types.Blocks) (int, error) {
PutTransactions(self.chainDb, block, block.Transactions())
// store the receipts
PutReceipts(self.chainDb, receipts)
// Write map map bloom filters
WriteMipmapBloom(self.chainDb, block.NumberU64(), receipts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct here. This branch is only accessed if you insert a block that is the new head of a canonical chain. If the block you just imported is currently a side chain, then you'll never write the bloom filter. For the above two PutTransactions and PutReceipts this is not a problem because the reorg takes care of invoking these ops when blocks get reorged. I think the correct place to put this is a bit up, right after PutBlockReceipts (https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L649). That would ensure that irrelevant of the "canonical status" of the new block, the bloom filters will be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree it would yield better results it's still not right. I'll give it some thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should happen during canonical insertion only (like the others).

We need a helper method that does a few things:

  • write canon hash
  • write block receipts
  • write mip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put it here, you can lose data.

Eg: You have an existing empty chain with no receipts: A-B-C-D, and a fork comes in rooted at B: A-B-1-X-Y. The 1 contains a receipt, none of the others do. When you insert 1, this bloom write doesn't get called, since 1 is only a side chain at that point. Then you insert X and Y, your chain gets reorged, but the bloom is never updated any more. So now you've lost knowledge about the receipt in 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... github only showed your first response, but not the second... Anyway, that solution would also work, but you either have to always write out the blooms, or link it to canonical changes to not lose data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI canonical insertion would also happen during chain reorg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean canonical number mapping, right? If yes, the yes, that's also a correct place to pt this, but in its current position here, it's wrong. The least it should also be added here https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L759, if you want to go down the route of only marking upon canonical status.

@ethernomad
Copy link
Contributor

@obscuren Gav said that cpp-ethereum already has indexing for searching logs, but in #1895 you say this would take up too much disk space. Maybe there is just confusion over definition of "indexing"

@karalabe
Copy link
Member

Indexing is a fairly expensive and complex operation. You need to maintain a separate "index" database (or dataset at least), which you need to keep in sync with block insertions, chain reorganisations, etc. I don't know exactly what the performance or storage hit would be, but probably non negligible.

However, if you think about it, all indexing would need to do is create a presence list. Is something present or not at a certain block. A full fledged index is much more powerful that that (hence the associated costs). This PR on the other hand uses a different (and imho quite a brilliant idea), of creating bloom filters at various resolutions, that simply state whether a given log has or has not a chance of being present at some section of the chain (e.g. from 100K to 110K). The bloom filters can provide false positives, but will never produce false negatives (i.e. if it says a log isn't present in the interval, no way can it be present). So you can quickly zoom in on sections of the chain that feature a certain log, and then iterate the blocks to find it (which are few by count).

PS: I've no idea how it's implemented in cpp.

@obscuren
Copy link
Contributor Author

@drupalnomad I've no idea how C++ is doing it, in fact I wasn't even aware C++ had something like indexing (or some other technique).

@ethernomad
Copy link
Contributor

Gav said this:

however for cpp, at least, there's no reason we can't apply the same scalability to lightclient logic and provide very fast topic searches over millions of blocks
the topic limit is actually 4 (3 indexed parameters in solidity + the indexed event name)

@ethernomad
Copy link
Contributor

Thanks for the info @karalabe

// parameters. For available levels see MIPMapLevels.
func GetMipmapBloom(db ethdb.Database, number, level uint64) types.Bloom {
bloomDat, _ := db.Get(mipmapKey(number, level))
return types.BytesToBloom(bloomDat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if the bloom filter cannot be found?

  • Panic
  • Creates an always negative filter
  • Creates an always positive filter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can't be found it creates an empty bloom filter (all negative)

@obscuren
Copy link
Contributor Author

@drupalnomad Ah, the indexed is a keyword in solidity. The indexed keywords indicates the compiler that the argument is something that is to be "searchable" through filtering; e.g. event T(i indexed uint) and when called, internally in the VM, calls LOG1(value). This value then becomes filterable.

If we had an event T(i uint) the i argument won't be filterable but instead is chucked in the extra data field.

Logs are triplets in the form of [address, [topics], data]. All indexed parameters are considered topics with a maximum of 4.

func (self *Filter) SetLatestBlock(latest int64) {
self.latest = latest
func (self *Filter) SetEndBlock(end int64) {
self.end = end
}

func (self *Filter) SetAddress(addr []common.Address) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well rename this to plural too.

@obscuren obscuren force-pushed the mipmap-bloom branch 3 times, most recently from a8a948e to e2e6889 Compare October 16, 2015 10:21

// returns a formatted MIP mapped key by adding prefix, canonical number and level
//
// ex. fn(98, 1000) = (prefix || 1000 || 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that last 1000 supposed to be a 0? :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🖕

@karalabe
Copy link
Member

👍 LGTM now, but note, I never used logs, so I do not know how to properly live test this. @frozeman Please check out this current version and play with it and report any issues.

Log filtering is now using a MIPmap like approach where addresses of
logs are added to a mapped bloom bin. The current levels for the MIP are
in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are
therefor filtered in batches of 1.000.
obscuren added a commit that referenced this pull request Oct 16, 2015
core, eth/filters, miner, xeth: Optimised log filtering
@obscuren obscuren merged commit 10ed107 into ethereum:develop Oct 16, 2015
@obscuren obscuren removed the review label Oct 16, 2015
@obscuren obscuren deleted the mipmap-bloom branch October 16, 2015 19:35
@obscuren obscuren modified the milestones: 2.0.0, 1.3.0 Oct 28, 2015
maoueh pushed a commit to streamingfast/go-ethereum that referenced this pull request Nov 1, 2023
* cmd: add tests for init-network command

* cmd: add setup function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants