-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed key into dict entry #541
Conversation
Signed-off-by: Harkrishn Patro <[email protected]>
0a5503b
to
2610832
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #541 +/- ##
============================================
+ Coverage 70.26% 70.32% +0.06%
============================================
Files 110 111 +1
Lines 60108 60286 +178
============================================
+ Hits 42234 42396 +162
- Misses 17874 17890 +16
|
@valkey-io/core-team Could you provide some clarity on this ? |
My thought captured under #394 (comment) We've made changes in the past which can benefit the users in a short term and have redone the implementation in the following major/minor version. key embedding in dictEntry is a pretty small change and we can easily get rid of it with dictEntry removal. I feel the small gain is worth it with this minimal change (7 bytes) for Valkey 8.0 and invest on the kvpair object structure with open addressing in 9.0 |
Yeah, I've thought about this many times. The complexity is not too bad. The technique of embedding an sds string can later be reused in other places. Although the embedding is abstracted, it only every makes sense for sds strings. That's fine though. Dict doesn't include "sds.h" so it's decoupled. I'm in favor. (I'll add some review comments later, just minor things.) The reason I've been skeptical before is that I'd rather like that we invest in embedding key in robj, since that'd be beneficial in the future redesign (#169), but since this PR is already ready and the robj work is not started, I think we can merge this for Valkey 8. |
@hpatro @madolson @zuiderkwast Is this change targeted for Valkey 8? If so, can we add it to Valkey 8 project board? |
I was waiting on performance numbers from Hari before officially adding it. |
@hpatro Do you have performance numbers? Can you please add it to this issue to move forward with next steps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a risky PR IMO. I am concerned about the mixed use of various dict entry types while using dictEntry
as the "universal" pointer. I don't think it is feasible for me to examine every use of dictEntry
, which should've been compiler's job. I am happy to help out on the refactoring if needed. Let me know.
@@ -509,6 +544,8 @@ dictEntry *dictInsertAtPosition(dict *d, void *key, void *position) { | |||
/* Allocate an entry without value. */ | |||
entry = createEntryNoValue(key, *bucket); | |||
} | |||
} else if (d->type->embedded_entry) { | |||
entry = createEmbeddedEntry(key, *bucket, d->type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way dictEntry
is used has no type safety. Admittedly, this is not a new issue but the addition of embeddedDictEntry
is making it worse. The following code path looks problematic to me. can you please double check?
setGenericComamd() -> setKey() -> dbAdd() -> dbAddInsternal() -> kvstoreDictSetKey() -> dictSetKey()
I don't seedictSetKey
getting patched to handle this new embedded dict entry.
I would suggest making dictEntry
an opaque struct next and force every function go through an inline accessor function/macro. I think this is the only certain way to ensure we don't accidentally use the wrong data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the dictEntry already opaque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in dict.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't agree it should be opaque in dict.c. It seems like a very small number of actual touch points we actually have to make sure are correct. We could even reduce the number. I'm not sure making it opaque will help all that much.
Discussed offline, Ping has a separate proposal for adding guardrails that he will publish.
Thanks for the review @PingXie and @zuiderkwast . I will shortly address them. @madolson I've posted the benchmarking results on the top comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think overall this is a good iterative improvement before we can make more changes later. I would like to see us embed the key into the robj in the future, so this has a lot of the same primitives as that.
@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know. |
it looks like used memory decreases 10% without changing on qps. |
@hpatro AFAIK most of the folks are inclined to accept it for 8, so I would ask you to follow up if you can. I'll throw it on our Monday agenda as well to make sure we close on it quickly if we can. |
@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too. |
Thanks for the help @PingXie. Let me publish the changes which I have done tomorrow and we can go from there. Do we want to do further refactoring as a follow up PR or along with this? |
Signed-off-by: Harkrishn Patro <[email protected]>
Thanks @hpatro.
I am inclined to reduce as much type casting as possible in this PR and reason being that I don't trust myself doing the compiler job (of type checking the data access). I am concerned about the potential memory corruption issues. Will dedicate some time later this week for a deep-dive. |
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Couple of things which remain
|
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits? |
Signed-off-by: Harkrishn Patro <[email protected]>
I see 4/6 people directionally inclined, so going to throw on the directionally approved tag. |
Signed-off-by: Madelyn Olson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor documentation changes that would make some stuff clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the added docs. I have a few follow-up comments on those.
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
@zuiderkwast / @madolson Shall we ship it ? 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, LGTM.
Awesome! Thanks all. |
Kicked off a full-run to stress this code a bit more before merging: https://github.com/valkey-io/valkey/actions/workflows/daily.yml |
return required_keylen; | ||
} | ||
assert(buf_len >= required_keylen); | ||
memcpy(buf, sdsAllocPtr(s), required_keylen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this double information about the header size? you can extract it from the "sds", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An sds s
is a pointer to where the string content starts, so it can be used as a C string. It does not point to the start of the allocation. The header is store before the char data in the same allocation, i.e. at s[-1]
, s[-2]
, etc. The header is backwards-encoded in some way, so the byte at s[-1]
says which kind of header it is and how large the header is.
sds-----.
|
allocation v
+-----------+----------------------+
| hdr |string contents \0 |
+-----------+----------------------+
When we store this embedded, we want to be able to restore the sds pointer, so we store the size of the header size in the first. When we restore the sds pointer, we can find it using S (the offset to the string contents).
+-+-----------+----------------------+
|S| hdr |string contents \0 |
+-+-----------+----------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the callers of this function can call sdsHdrSize(s[-1])
themselves? I'm not sure if it's public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, right agree. I am not so supportive of the desire to keep something as sds when it's not, but that's already in my overall comment, maybe I'll elaborate more there. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
is being used throughout the engine as a sds
so changing that would be even more touchpoint. We could dynamically build it on the dictGetKey
call but that would be additional penalty. Hence, storing it as a sds made the most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question, the low 3bit in sdshdr.flag have told us the header length, and exposing function sdsHdrSize
will not lead to coupling between dict.c and sds.c, I don't see any harm in it.
sdshdr sds
│ │
│ │
▼─────┬─────┬─────┌▼───────────────────────────────┬──┐
│ len │ alloc │ │ │
│ │ │flags│ │\0│
└─────┼─────┼─────└────────────────────────────────┴──┘
This change makes sense to me overall and tradeoffs are good. My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:
I feel this has very big potential for very hard to track bugs |
Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead. It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. |
@zvi-code Thanks for sharing your thought. Would you mind filling a separate issue about this? Given we are planning to rehaul the hashtable implementation (#169) in 9.0. It will be good to capture some of these points. |
Actual test run I kicked off: https://github.com/valkey-io/valkey/actions/runs/9764931105 |
This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved? |
Yeah, I was also expecting more gain in throughput as well but saw a tiny gain. Anyway we were happy with the memory savings (without paying any cost). @judeng Do you have any recommendation to perform benchmarking to be able to observe the improvement in performance from CPU cache locality ? |
This PR incorporates changes related to key embedding described in the redis/redis#12216
With this change there will be no
key
pointer and embedded the key within thedictEntry
. 1 byte is used for additional bookkeeping. Overall the saving would be 7 bytes.Key changes:
New dict entry type introduced, which is now used as an entry for the main dictionary:
One new function has been added to the dictType:
Change is opt-in per dict type, hence sets, hashes and other types that are using dictionary are not impacted.
With this change main dictionary now owns the data, so copy on insert in dbAdd is no longer needed.
Benchmarking results
TLDR; Around 9-10% memory usage reduction in overall memory usage for scenario with key of 16 bytes and value of 8 bytes and 16 bytes. The throughput per second varies but is similar or greater in most of the run(s) with the changes against unstable (ae2d421).
SET performance
GET performance