-
Notifications
You must be signed in to change notification settings - Fork 1k
gps: source cache: protobuf integration #1127
Conversation
|
||
[[constraint]] | ||
name = "github.com/jmank88/nuts" | ||
version = "0.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to vendor this if we don't want to pollute. Usages are few enough to copy in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm fine with bringing it in
internal/gps/source_cache.proto
Outdated
} | ||
Type type = 1; | ||
string value = 2; | ||
//TODO strongly typed Semver field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're set up to drop whatever sort of Semver message type we come up with into here (and in a forward compatible way, if we want to commit first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool cool, ok - but right now, things are inoperable with this missing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is operable - as much as the tests validate anyways. It is using string serialization into the value field for now.
Codeclimate does not like protobuf |
CI checks are blocked by golang/go#21804 right now, but we can likely work around it by redefining some methods as functions, if we need to. |
if we feel the methods are better than the funcs for our design, then i'd prefer we specify an exception to staticcheck for this particular error. if you look back over previous versions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple quick things, not a comprehensive review
|
||
[[constraint]] | ||
name = "github.com/jmank88/nuts" | ||
version = "0.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm fine with bringing it in
internal/gps/source_cache.proto
Outdated
} | ||
Type type = 1; | ||
string value = 2; | ||
//TODO strongly typed Semver field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool cool, ok - but right now, things are inoperable with this missing, right?
internal/gps/source_cache.pb.go
Outdated
} | ||
|
||
// LockedProjectMsg is a serializable representation of LockedProject. | ||
type LockedProjectMsg struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this has to be exported, right? can we rearrange to an internal subpackage or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a problem to put it in an internal subpackage. That might indirectly dodge the go/types problem too.
What are the license header rules regarding generated files? Should I find a way to inject a header into This file does not have a license header, whatever that means: https://github.com/golang/protobuf/blob/master/proto/testdata/test.pb.go |
@@ -8,5 +8,5 @@ | |||
set -e | |||
|
|||
go build ./hack/licenseok | |||
find . -path ./vendor -prune -o -type f -name "*.go"\ | |||
find . -path ./vendor -prune -o -regex ".+\.pb\.go$" -prune -o -type f -regex ".*\.\(go\|proto\)$"\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to include *.proto
files and exclude the generated *.pb.go
files (but not pb.go
!). I stumbled my way into this, but perhaps there is a simpler or more readable way to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have no immediate ideas about better ways of doing this; i'm fine with a slightly wonky regex.
@@ -15,3 +15,4 @@ exclude_paths: | |||
- internal/gps/_testdata | |||
- cmd/dep/testdata | |||
- testdata | |||
- internal/gps/internal/pb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like **.pb.go
(if legal here) would be more general and continue to work as things change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more incremental notes here
@@ -73,31 +74,31 @@ func (c *boltCache) close() error { | |||
// | |||
// 1) Versions buckets hold version keys with revision values: | |||
// | |||
// Bucket: "versions:<timestamp>" | |||
// Keys: "branch:<branch>", "defaultBranch:<branch>", "ver:<version>" | |||
// Bucket: "v<timestamp>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be worth not having sub-buckets for each of these: boltdb/bolt#422 (comment)
for now i tend to prefer uniformity, but it's something we could explore with benchmarks later on, if we feel so inclined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary motivation was being able to delete entire buckets in a single op, rather than prefix scanning and deleting each key. Another was avoiding repeated, long, complex key-prefixes. Perhaps the Revision-versions
buckets have few enough entries to yield speed up from prefixes, but most other buckets are quite large and/or would have keys much longer than their values.
} | ||
} | ||
var ( | ||
cacheKeyC = []byte("c") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine to use the abbreviated key for efficiency reasons, but let's use full words rather than abbreviations in the consts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't map 1-1, which is why I opted to name them what they are, not how they are used.
Is this better?
var (
cacheKeyComment = []byte("c")
cacheKeyConstraint = cacheKeyComment
cacheKeyError = []byte("e")
cacheKeyHash = []byte("h")
cacheKeyIgnored = []byte("i")
cacheKeyImport = cacheKeyIgnored
cacheKeyLock = []byte("l")
cacheKeyName = []byte("n")
cacheKeyOverride = []byte("o")
cacheKeyPTree = []byte("p")
cacheKeyRequired = []byte("r")
cacheKeyRevision = cacheKeyRequired
cacheKeyTestImport = []byte("t")
cacheRevision = byte('r')
cacheVersion = byte('v')
)
I wish this all could have dropped into a separate cache package to avoid all the annoying prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Regarding splitting packages, IIRC it would require at least (maybe only?) pushing version stuff into a separate gps/version
package to avoid cycles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, i see. in this case, though, a bit of duplication for the reader's sake is acceptable.
(Regarding splitting packages, IIRC it would require at least (maybe only?) pushing version stuff into a separate gps/version package to avoid cycles)
i've considered such a thing quite a bit, but i don't think we can currently manage it because of all the reliance on hidden properties. i'd be open to refactoring that and just exporting everything - i was probably defending against a bogeyman by not exporting it - but i don't want to make that a yak we have to shave for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started a separate (from master
) experimental conversion, leveraging aliases quite heavily for now to minimize churn, and it's not actually too bad - plus some of the API ends up cleaner. I dropped in a few exported helper functions, rather than exporting all the internals. Not that helper functions are necessarily a better solution than exporting the types, but it proved the concept without too much refactoring.
This can all be considered separately from this PR and caching in general though.
@@ -203,9 +174,13 @@ func cacheGetManifest(b *bolt.Bucket) (RootManifest, error) { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
premature optimization question: does a cheap call exist that allows us to get the number of items in each of the buckets, so that we can provide a capacity hint on the make
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are all sorts of BucketStats
available via Stats()
but they have to be computed so I didn't explore that path.
For the ones using sequential integer keys (behaving like sets), we could theoretically inspect the last key to get the total items. I don't know how the costs of jumping the cursor around, or scanning backwards, compares to re-allocating the slice.
More generally, and perhaps simply, we could just store size hints in additional top level keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size hints as additional top-level keys is interesting. seems like a basically zero-cost way of doing things - that is, as long as the use pattern continues to be that these buckets tend to get dropped and wholly regenerated, rather than incrementally updated. if incremental updates become more the norm, then i'd be concerned that we're create a write amplifier. that might not be that costly, especially if the workflow is heavily read-oriented (which, of course it should be), but...well, with all these considerations, we'd need benchmarks!
probably best to leave it as-is for now, though, and put a big TODO on it noting the possible future direction to explore.
ok yeah, add that TODO then we can merge |
pre-emptively approving |
What does this do / why do we need it?
This change replaces most of the custom bolt cache encoding with Protocol Buffers, and includes some other optimizations as well (smaller and re-usable constant keys, smaller 'set' keys,
Batch
updates to bolt).What should your reviewer look out for in this PR?
Optimizations. Correctness.
Which issue(s) does this PR fix?
Follow up from #1098 in the series towards #431