-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, eth/filters, miner, xeth: Optimised log filtering #1899
Conversation
Updated: Fri Oct 16 16:18:49 UTC 2015 |
1629d3b
to
5b5be89
Compare
This PR still requires a database update strategy to add all receipt log addresses to the mip mapped blooms. |
Current coverage is
|
|
@karalabe the full-node PR is going to need some changes for this because we should still be able to search through past receipts. Wherever in the code receipts are being added it's going to require an additional |
@obscuren Sure, just ping me after this gets merged, or if the other way around, I'll tell you where to insert this (i.e. Btw, just a question with assigning receipts to block numbers (haven't looked at your code yet). I see you assign it to block numbers. Can you handle cases where there are multiple blocks/receipts with the same number? Currently in the database everything (apart from the canonical mapping) is mapped to hashes, so scenarios like this don't occur. Just trying to make sure this doesn't go misinterpreted :) |
@karalabe nothing gets assigned to anything actually. There are general bloom bins for several levels (0- All of this cares nothing for hashes. There are potentially higher rates of false positives due to chain re-orgs but I think that's quite alright. |
I just tried your branch, it doesn't give me the past logs, even though it returns fast. |
@frozeman yes full resynx until I get the database merge feature in (see first comment) |
5b5be89
to
78c68c6
Compare
Added upgrade strategy and tests for upgrading |
} | ||
|
||
func TestFilters(t *testing.T) { | ||
const dbname = "/tmp/mipmap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit dangerous as it can overwrite stuff, may use leftovers from previous runs, on windows this will not be interpreted well, etc. A better solution would be to let Go create a temp folder for it (dir, err := ioutil.TempDir("", "mipmap")
) and also make sure it's cleaned up afterwards (defer os.RemoveAll(dir)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that was only temporary. Good catch, well change that to TempDir
dc1d78c
to
b8a0e88
Compare
@karalabe PTAL changed the |
@@ -666,6 +666,8 @@ func (self *BlockChain) InsertChain(chain types.Blocks) (int, error) { | |||
PutTransactions(self.chainDb, block, block.Transactions()) | |||
// store the receipts | |||
PutReceipts(self.chainDb, receipts) | |||
// Write map map bloom filters | |||
WriteMipmapBloom(self.chainDb, block.NumberU64(), receipts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct here. This branch is only accessed if you insert a block that is the new head of a canonical chain. If the block you just imported is currently a side chain, then you'll never write the bloom filter. For the above two PutTransactions
and PutReceipts
this is not a problem because the reorg
takes care of invoking these ops when blocks get reorged. I think the correct place to put this is a bit up, right after PutBlockReceipts
(https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L649). That would ensure that irrelevant of the "canonical status" of the new block, the bloom filters will be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree it would yield better results it's still not right. I'll give it some thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should happen during canonical insertion only (like the others).
We need a helper method that does a few things:
- write canon hash
- write block receipts
- write mip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it here, you can lose data.
Eg: You have an existing empty chain with no receipts: A-B-C-D, and a fork comes in rooted at B: A-B-1-X-Y. The 1
contains a receipt, none of the others do. When you insert 1
, this bloom write doesn't get called, since 1 is only a side chain at that point. Then you insert X and Y, your chain gets reorged, but the bloom is never updated any more. So now you've lost knowledge about the receipt in 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... github only showed your first response, but not the second... Anyway, that solution would also work, but you either have to always write out the blooms, or link it to canonical changes to not lose data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI canonical insertion would also happen during chain reorg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean canonical number mapping, right? If yes, the yes, that's also a correct place to pt this, but in its current position here, it's wrong. The least it should also be added here https://github.com/obscuren/go-ethereum/blob/mipmap-bloom/core/blockchain.go#L759, if you want to go down the route of only marking upon canonical status.
Indexing is a fairly expensive and complex operation. You need to maintain a separate "index" database (or dataset at least), which you need to keep in sync with block insertions, chain reorganisations, etc. I don't know exactly what the performance or storage hit would be, but probably non negligible. However, if you think about it, all indexing would need to do is create a presence list. Is something present or not at a certain block. A full fledged index is much more powerful that that (hence the associated costs). This PR on the other hand uses a different (and imho quite a brilliant idea), of creating bloom filters at various resolutions, that simply state whether a given log has or has not a chance of being present at some section of the chain (e.g. from 100K to 110K). The bloom filters can provide false positives, but will never produce false negatives (i.e. if it says a log isn't present in the interval, no way can it be present). So you can quickly zoom in on sections of the chain that feature a certain log, and then iterate the blocks to find it (which are few by count). PS: I've no idea how it's implemented in cpp. |
@drupalnomad I've no idea how C++ is doing it, in fact I wasn't even aware C++ had something like indexing (or some other technique). |
227ddcc
to
8c1ee94
Compare
Gav said this:
|
Thanks for the info @karalabe |
// parameters. For available levels see MIPMapLevels. | ||
func GetMipmapBloom(db ethdb.Database, number, level uint64) types.Bloom { | ||
bloomDat, _ := db.Get(mipmapKey(number, level)) | ||
return types.BytesToBloom(bloomDat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if the bloom filter cannot be found?
- Panic
- Creates an always negative filter
- Creates an always positive filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can't be found it creates an empty bloom filter (all negative)
@drupalnomad Ah, the If we had an Logs are triplets in the form of |
func (self *Filter) SetLatestBlock(latest int64) { | ||
self.latest = latest | ||
func (self *Filter) SetEndBlock(end int64) { | ||
self.end = end | ||
} | ||
|
||
func (self *Filter) SetAddress(addr []common.Address) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well rename this to plural too.
a8a948e
to
e2e6889
Compare
|
||
// returns a formatted MIP mapped key by adding prefix, canonical number and level | ||
// | ||
// ex. fn(98, 1000) = (prefix || 1000 || 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that last 1000 supposed to be a 0? :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🖕
e2e6889
to
9df5a5c
Compare
👍 LGTM now, but note, I never used logs, so I do not know how to properly live test this. @frozeman Please check out this current version and play with it and report any issues. |
9df5a5c
to
2aa1a6b
Compare
Log filtering is now using a MIPmap like approach where addresses of logs are added to a mapped bloom bin. The current levels for the MIP are in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are therefor filtered in batches of 1.000.
2aa1a6b
to
6dc1478
Compare
core, eth/filters, miner, xeth: Optimised log filtering
* cmd: add tests for init-network command * cmd: add setup function
Log filtering is now using a MIPmap like approach where addresses of
logs are added to a mapped bloom bin. The current levels for the MIP are
in ranges of 1.000.000, 500.000, 100.000, 50.000, 1.000. Logs are
therefor filtered in batches of 1.000.
Closes #1895