Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Datastore based pinner #4

Merged
merged 29 commits into from
Nov 30, 2020
Merged

Datastore based pinner #4

merged 29 commits into from
Nov 30, 2020

Conversation

gammazero
Copy link

This PR provides two separate pinner implementations:

  1. The original mdag based pinner
  2. New datastore based pinner

The new datastore pinner stores pins in the datastore as individual key-value items. This is faster than the mdag pinner, which stored all pins in a single dag that had to be completely rewritten every time a pin was added or removed. The new pinner provides a secondary indexing mechanism that can be used to index any data that a pin has.

Benchmarks are provided to compare performance of between the old and new pinners.

Other features / changes of datastore pinner:

  • Functions to import from mdag pinner and export to mdag pinner, to support migration.
  • Do not keep pinned CID sets in memory (no cache)
  • Keep separate recursive and direct CID indexes. This allows searching for a direct or recursive CIDs without having to load pins to check the mode.
  • Cbor encoded pin data
  • Load pins and rebuild indexes on load, if dirty flag indicates index repair may be needed

aschmahmann and others added 9 commits September 30, 2020 10:57
Pins are stored in the datastore as separate key-value items.  This allows pins to be saved (flushed) without havint to hash the entire pin set into a hierarchical dag on each flush.  This also means there are no longer any need for internal pins to pin the blocks used to store the pin dag.

Secondary indexes are also supported, allowing for pins to be searched for using keys othen than the primary key.  This supports multiple pins for the same CID as well as search by different pin attributes, when those features become available.
- Keep separate recursive and direct CID indexes.  This alows searching for a direct or recursive CIDs without having to load pins to check the mode.
- Only load pins if dirty flag indicates index repair may be needed
- Improved benchmarks
@gammazero gammazero requested a review from aschmahmann October 29, 2020 06:57
dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer_test.go Outdated Show resolved Hide resolved
dsindex/indexer_test.go Outdated Show resolved Hide resolved
dsindex/indexer_test.go Outdated Show resolved Hide resolved
dsindex/indexer_test.go Outdated Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin.go Outdated Show resolved Hide resolved
ipldpinner/pin.go Outdated Show resolved Hide resolved
dspinner/pin_test.go Outdated Show resolved Hide resolved
dspinner/pin_test.go Show resolved Hide resolved
dspinner/pin_test.go Show resolved Hide resolved
dsindex/indexer.go Show resolved Hide resolved
dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer.go Outdated Show resolved Hide resolved
dspinner/pin_test.go Show resolved Hide resolved
This includes moving the pin converstion logic into the pinconv package.
- Indexer functions take context
- SyncIndex is not part of Indexer interface
- Test corrupt index by adding index with no pin
- and more...
Base36 encode the index and strings to allow them to contain any characters without interferring with the datastore key path.  Base36 was chosed because it is slightly more compact than Base32, more portable than Base58, Base64, etc., and because it has a very fast implementation.

dspinner can now use cid.KeyString() to store the raw byte string as in index. This avoids having to encode the cid every time it is used as an index.
dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer_test.go Outdated Show resolved Hide resolved
- Change naming from "index" and "key" to "key" and "value"
- Use wrapped datastore instead of using index key directly
- Fix typo in comment
Copy link

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Left a few tiny fixup suggestions

dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer.go Outdated Show resolved Hide resolved
dsindex/indexer.go Outdated Show resolved Hide resolved
@aschmahmann aschmahmann merged commit 4c92071 into master Nov 30, 2020
@gammazero gammazero deleted the feat/pin-datastore branch December 1, 2020 00:04
@aschmahmann aschmahmann mentioned this pull request Feb 18, 2021
73 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants