Test #1

madolson · 2024-03-22T02:56:28Z

No description provided.

@kjmph

The user @kjmph provided excellent ideas to improve speed of ZUNIONSTORE (in certain cases by many order of magnitude), together with an implementation of the ideas. While the ideas were sounding, the implementation could be improved both in terms of speed and clearness, so that's my attempt at reimplementing the speedup proposed, trying to improve by directly using just a dictionary with an embedded score inside, and reusing the single-pass aggregate + order-later approach. Note that you can't apply this commit without applying the previous commit in this branch that adds a double in the dictEntry value union. Issue #1786.

…is missed cases to redis-server. (#12322) Observed that the sanitizer reported memory leak as clean up is not done before the process termination in negative/following cases: **- when we passed '--invalid' as option to redis-server.** ``` -vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid *** FATAL CONFIG FILE ERROR (Redis 255.255.255) *** Reading the configuration file, at line 2 >>> 'invalid' Bad directive or wrong number of arguments ================================================================= ==865778==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x7f0985f65867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 valkey-io#1 0x558ec86686ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117 valkey-io#2 0x558ec86686ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135 valkey-io#3 0x558ec86686ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276 valkey-io#4 0x558ec86686ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327 valkey-io#5 0x558ec865dd7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172 valkey-io#6 0x558ec87a1be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472 valkey-io#7 0x558ec87a13b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718 valkey-io#8 0x558ec85e6f15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258 valkey-io#9 0x7f09856e5d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s). ``` **- when we pass '--port' as option and missed to add port number to redis-server.** ``` vm:~/mem-leak-issue/redis$ ./src/redis-server --port *** FATAL CONFIG FILE ERROR (Redis 255.255.255) *** Reading the configuration file, at line 2 >>> 'port' wrong number of arguments ================================================================= ==865846==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x7fdcdbb1f867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 valkey-io#1 0x557e8b04f6ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117 valkey-io#2 0x557e8b04f6ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135 valkey-io#3 0x557e8b04f6ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276 valkey-io#4 0x557e8b04f6ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327 valkey-io#5 0x557e8b044d7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172 valkey-io#6 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472 valkey-io#7 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718 valkey-io#8 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258 valkey-io#9 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Indirect leak of 10 byte(s) in 1 object(s) allocated from: #0 0x7fdcdbb1fc18 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164 valkey-io#1 0x557e8b04f9aa in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:287 valkey-io#2 0x557e8b04f9aa in ztryrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:317 valkey-io#3 0x557e8b04f9aa in zrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:342 valkey-io#4 0x557e8b033f90 in _sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:271 valkey-io#5 0x557e8b033f90 in sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:295 valkey-io#6 0x557e8b033f90 in sdscatlen /home/ubuntu/mem-leak-issue/redis/src/sds.c:486 valkey-io#7 0x557e8b044e1f in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1165 valkey-io#8 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472 valkey-io#9 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718 valkey-io#10 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258 valkey-io#11 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 18 byte(s) leaked in 2 allocation(s). ``` As part analysis found that the sdsfreesplitres is not called when this condition checks are being hit. Output after the fix: ``` vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid *** FATAL CONFIG FILE ERROR (Redis 255.255.255) *** Reading the configuration file, at line 2 >>> 'invalid' Bad directive or wrong number of arguments vm:~/mem-leak-issue/redis$ =========================================== vm:~/mem-leak-issue/redis$ ./src/redis-server --jdhg *** FATAL CONFIG FILE ERROR (Redis 255.255.255) *** Reading the configuration file, at line 2 >>> 'jdhg' Bad directive or wrong number of arguments --------------------------------------------------------------------------- vm:~/mem-leak-issue/redis$ ./src/redis-server --port *** FATAL CONFIG FILE ERROR (Redis 255.255.255) *** Reading the configuration file, at line 2 >>> 'port' wrong number of arguments ``` Co-authored-by: Oran Agra <[email protected]>

## Issues and solutions from #12817 1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without GIL in afterSleep() - Introduced: Version: 7.0.0 PR: #9963 - Harm Level: Very High If the module thread calls `RM_Yield()` before the main thread enters afterSleep(), and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main thread to not wait for GIL, which can lead to all kinds of unforeseen problems, including memory data corruption. - Initial / Abandoned Solution: * Added `__thread` specifier for ProcessingEventsWhileBlocked. `ProcessingEventsWhileBlocked` is used to protect against nested event processing, but event processing in the main thread and module threads should be completely independent and unaffected, so it is safer to use TLS. * Adding a cached module count to keep track of the current number of modules, to avoid having to use `dictSize()`. - Related Warnings: ``` WARNING: ThreadSanitizer: data race (pid=1136) Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0): #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124) valkey-io#1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c) valkey-io#2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8) Previous read of size 4 at 0x0001045990c0 by main thread: #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98) valkey-io#1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64) valkey-io#2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c) valkey-io#3 main server.c:7220 (redis-server:arm64+0x10003f38c) ``` 2. aeApiPoll() is not thread-safe When using RM_Yield to handle events in a module thread, if the main thread has not yet entered `afterSleep()`, both the module thread and the main thread may touch `server.el` at the same time. - Introduced: Version: 7.0.0 PR: #9963 - Old / Abandoned Solution: Adding a new mutex to protect timing between after beforeSleep() and before afterSleep(). Defect: If the main thread enters the ae loop without any IO events, it will wait until the next timeout or until there is any event again, and the module thread will always hang until the main thread leaves the event loop. - Related Warnings: ``` SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask ================== ================== WARNING: ThreadSanitizer: data race (pid=14682) Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0): #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588) valkey-io#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84) valkey-io#2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4) valkey-io#3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c) valkey-io#4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c) Previous write of size 4 at 0x000100b54000 by main thread: #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588) valkey-io#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84) valkey-io#2 aeMain ae.c:496 (redis-server:arm64+0x100010da8) valkey-io#3 main server.c:7238 (redis-server:arm64+0x10003f51c) ``` ## The final fix as the comments: redis/redis#12817 (comment) Optimized solution based on the above comment: First, we add `module_gil_acquring` to indicate whether the main thread is currently in the acquiring GIL state. When the module thread starts to yield, there are two possibilities(we assume the caller keeps the GIL): 1. The main thread is in the mid of beforeSleep() and afterSleep(), that is, `module_gil_acquring` is not 1 now. At this point, the module thread will wake up the main thread through the pipe and leave the yield, waiting for the next yield when the main thread may already in the acquiring GIL state. 2. The main thread is in the acquiring GIL state. The module thread release the GIL, yielding CPU to give the main thread an opportunity to start event processing, and then acquire the GIL again until the main thread releases it. This is what redis/redis#12817 (comment) mentioned direction. --------- Co-authored-by: Oran Agra <[email protected]>

We are updating this variable in the main thread, and the child threads can printing the logs at the same time. This generating a warning in SANITIZER=thread: ``` WARNING: ThreadSanitizer: data race (pid=74208) Read of size 4 at 0x000102875c10 by thread T3: #0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556) #1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89) #2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9) Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0): #0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b) #1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52) #2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c) #3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) #4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) ``` The refresh of daylight_active is not real time, we update it in aftersleep, so we don't need a strong synchronization, so using memory_order_relaxed. But also noted we are doing load/store operations only for daylight_active, which is an aligned 32-bit integer, so using memory_order_relaxed will not provide more consistency than what we have today. So this is just a cleanup that to clear the warning. Signed-off-by: Binbin <[email protected]>

We are updating this variable in the main thread, and the child threads can printing the logs at the same time. This generating a warning in SANITIZER=thread: ``` WARNING: ThreadSanitizer: data race (pid=74208) Read of size 4 at 0x000102875c10 by thread T3: #0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556) valkey-io#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89) valkey-io#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9) Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0): #0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b) valkey-io#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52) valkey-io#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c) valkey-io#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) valkey-io#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) ``` The refresh of daylight_active is not real time, we update it in aftersleep, so we don't need a strong synchronization, so using memory_order_relaxed. But also noted we are doing load/store operations only for daylight_active, which is an aligned 32-bit integer, so using memory_order_relaxed will not provide more consistency than what we have today. So this is just a cleanup that to clear the warning. Signed-off-by: Binbin <[email protected]> Signed-off-by: mwish <[email protected]>

We are updating this variable in the main thread, and the child threads can printing the logs at the same time. This generating a warning in SANITIZER=thread: ``` WARNING: ThreadSanitizer: data race (pid=74208) Read of size 4 at 0x000102875c10 by thread T3: #0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556) #1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89) #2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9) Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0): #0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b) #1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52) #2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c) #3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) #4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365) ``` The refresh of daylight_active is not real time, we update it in aftersleep, so we don't need a strong synchronization, so using memory_order_relaxed. But also noted we are doing load/store operations only for daylight_active, which is an aligned 32-bit integer, so using memory_order_relaxed will not provide more consistency than what we have today. So this is just a cleanup that to clear the warning. Signed-off-by: Binbin <[email protected]>

antirez added 3 commits June 6, 2014 15:32

Add double field in dict.c entry value union.

fe85d06

Remove useless var and check in zunionInterGenericCommand().

8b9d9d8

madolson requested a review from PingXie March 22, 2024 03:01

madolson closed this Mar 22, 2024

madolson deleted the zunion branch April 4, 2024 19:15

PingXie mentioned this pull request Jun 14, 2024

Significance of absolute accurate used_memory? #467

Open

hpatro mentioned this pull request Jun 24, 2024

Embed key into dict entry #541

Merged

zvi-code mentioned this pull request Jul 2, 2024

code usege of sds (and maybe other data types) #739

Open

PingXie mentioned this pull request Oct 24, 2024

[NEW] Disable LUA integration during build #1204

Open

allenss-amazon mentioned this pull request Nov 26, 2024

[NEW] OBSERVE command for enhanced observability in Valkey: Data Collection #1167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test #1

Test #1

madolson commented Mar 22, 2024

Test #1

Test #1

Conversation

madolson commented Mar 22, 2024