-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor out routing system #655
base: master
Are you sure you want to change the base?
Conversation
72f8ab7
to
eca6287
Compare
// GetClosestPeersSeeded is the Kademlia 'node lookup' operation | ||
func (dht *IpfsDHT) GetClosestPeersSeeded(ctx context.Context, key string, seedPeers []peer.ID, useRTPeers bool) (<-chan peer.ID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first of these double functions, it's a bit unfortunate that we have to expose 2x as many functions for the same thing. We could move them into a new struct, but I'm not sure if it's worth it. Thoughts?
If we're going to leave these functions on the main struct we should have a consistent naming scheme for these. Extended
feeds pretty general and something to do with seeding the query or being a continuable query seems a little specific. I'm up for suggestions, otherwise I'll just use [OriginalName]Exteneded
everywhere.
Also, while we're rewriting this it would be great to return []peer.ID instead of chan peer.ID, but some tests may have to be modified.
) | ||
|
||
var ( | ||
logger = logging.Logger("dht.routing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just use the dht
logger here?
type Processor interface { | ||
Process(interface{}, func()) (interface{}, error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty unfortunate and is related to Go's lack of generics. We could also do RecordProcessor
and ProviderProcessor
here and just duplicate more of the code. Both options seems similarly gross to me, but having a single interface makes some of this a bit simpler (e.g. less routing options, sharing the CountStopper, etc.).
Also, once we move to a unified record system this should hopefully go away 🙏
// FindProviders searches for the providers corresponding to given Key and streams the results. | ||
func FindProviders(ctx context.Context, key multihash.Multihash, findProvsFn findProvsFn, processors []Processor, cfg *routing.Options) (<-chan peer.AddrInfo, <-chan []peer.ID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically just a duplicated version of the SearchValue
function, but with added type safety. Not sure if it's worth it, but seems reasonable.
outChSize := maxRequestedRecords | ||
if outChSize == 0 { | ||
outChSize = 1 | ||
} | ||
out := make(chan peer.AddrInfo, outChSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has previously existed and for a requested provider record count > 0 the channel buffer = requested record count. I don't know if this is really necessary anymore, it seems like we could just set the channel size to 1.
While this doesn't really hurt too much on the memory front it does feel a little weird and certainly doesn't align with the SearchValue setup.
Thoughts on just removing this and setting the channel size to 1?
// | ||
// Deprecated: use github.com/libp2p/go-libp2p-kad-dht/routing.Quorum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe moving the routing options into it's own package was overkill. Thoughts?
case <-ctx.Done(): | ||
return false | ||
} | ||
processors = []dhtrouting.Processor{validation, quorum, bestValue} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code actually does quorum -> bestValue. The faithful representation of this here would be quorum -> validation -> bestValue (will fix in the next PR update). It's unfortunate that we end up validating twice (once in the getValues function and once in the pipeline) we can optimize this as well.
We should also figure out what (if anything) we want to do with the quorum function for SearchValue. I think requiring a certain number of the latest records to be equal would be reasonable, and would allow us to throw out invalid records and make things a little easier. Alternatively, we could just drop the quorum function entirely.
MaxCount: dhtrouting.GetQuorum(&cfg), | ||
} | ||
|
||
processors = []dhtrouting.Processor{newValuesOnly, quorum} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: While SearchValue does quorum -> bestValue, here we do newValuesOnly -> quorum. I'd like some comments explaining the pipeline order for each function.
// If we have enough peers locally, don't bother with remote RPC | ||
// TODO: is this a DOS vector? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If people start querying with count = 0 then this isn't a problem, so may be we should just emphasize that?
package routing | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments needed in this file. Also, is this separate package currently a good idea?
eca6287
to
a97bb8f
Compare
… return the closest peers used in the query
a97bb8f
to
961c097
Compare
In order to support #616 we need to have more configurability of how records received from the DHT are processed (i.e. everything we do aside from the Kademlia logic itself).
This PR takes a stab at that by introducing a set of processing functions that operate on the received records and then making those configurable via routing options.
One thing that's currently missing from the PR is the ability to configure who updating puts are sent to when we do a GetValue/SearchValue query.
A next step to examine is extracting the networking/RPC code from the routing + query logic so that they are attached to separate objects (if not packages) to make exposing additional functionality less overwhelming and confusing to new users.
Still a WIP, but looking for some feedback @aarshkshah1992