Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR in fullrt/dht.go: key not found in map #1001

Open
lidel opened this issue Nov 18, 2024 · 2 comments
Open

ERROR in fullrt/dht.go: key not found in map #1001

lidel opened this issue Nov 18, 2024 · 2 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@lidel
Copy link
Member

lidel commented Nov 18, 2024

Testing Kubo 0.32.1 with go-libp2p-kad-dht v0.28.1 revealed some ERROR messages in log:

Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.870Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.870Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.905Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.906Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.952Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.953Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.953Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.963Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.963Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.963Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.963Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.966Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.966Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:36 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:36.966Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:37 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:37.077Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map
Nov 18 16:01:37 kubo-staging-us-east-02 ipfs[573811]: 2024-11-18T16:01:37.086Z        ERROR        fullrtdht        fullrt/dht.go:450        key not found in map

Filling this issue so others can +1 if they also experienced it.
I'll monitor things for next few weeks and post update if we see it again in our infra.

Analysis

ERROR comes from:

func (dht *FullRT) GetClosestPeers(ctx context.Context, key string) ([]peer.ID, error) {
_, span := internal.StartSpan(ctx, "FullRT.GetClosestPeers", trace.WithAttributes(internal.KeyAsAttribute("Key", key)))
defer span.End()
kbID := kb.ConvertKey(key)
kadKey := kadkey.KbucketIDToKey(kbID)
dht.rtLk.RLock()
closestKeys := kademlia.ClosestN(kadKey, dht.rt, dht.bucketSize)
dht.rtLk.RUnlock()
peers := make([]peer.ID, 0, len(closestKeys))
for _, k := range closestKeys {
dht.kMapLk.RLock()
p, ok := dht.keyToPeerMap[string(k)]
if !ok {
logger.Errorf("key not found in map")
}
dht.kMapLk.RUnlock()
dht.peerAddrsLk.RLock()
peerAddrs := dht.peerAddrs[p]
dht.peerAddrsLk.RUnlock()
dht.h.Peerstore().AddAddrs(p, peerAddrs, peerstore.TempAddrTTL)
peers = append(peers, p)
}
return peers, nil
}

Two problems:

  • it seems this should not be error, but a debug log message
  • also, the fact we just log, and still process this non-existing peer, means the code will try to access peerAddrs for a non-existent peer, which could lead to unexpected behavior. It looks like a bug to me, or a very anti-pattern way of signaling there is no info for a peer.

Potential fix

Did not test it, but my intuition would be to log to debug + skip appending nil peer?

+ dht.kMapLk.RUnlock()
  if !ok {
-   logger.Errorf("key not found in map: %v", k)
+   logger.Debugf("key not found in map: %v", k)

+   continue
  }
- dht.kMapLk.RUnlock()
@lidel lidel added the need/triage Needs initial labeling and prioritization label Nov 18, 2024
@guillaumemichel
Copy link
Contributor

What you suggest makes sense @lidel

However, IIUC this reveals a deeper problem in the fullrt implementation, since all keys that are in the trie (dht.rt) are expected to have a peer id mapping.

@aschmahmann
Copy link
Contributor

aschmahmann commented Nov 21, 2024

  1. Yes all keys in the trie are expected to have mappings to their peerIDs (and addresses) so this is surprising
  2. That's also why this logs an error. Logging debug wouldn't tell us if there was an issue since almost no one looks at debug logs, and people have been running the client for a while now so I suspect we'd have seen complaints if this was happening all the time and errors were clogging up people's consoles

Adding the continue seems reasonable, but also this is likely a race condition related to having separate locks for the trie and the map and probably fixable by having them share a lock.

Does this happen frequently? Any conditions that make it happen more frequently, and what's your confidence that if we roll out a fix you'll notice it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

3 participants