0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

Guanqinm · 2023-09-14T14:13:24Z

Hi
Here we met a question. When we were testing the marginal parameters, a coredump was generated, which is shown below:

(gdb) bt
#0 __pthread_kill_implementation (threadid=281473695391360, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
#1 0x0000ffffb5073304 in __pthread_kill_internal (signo=, threadid=)
at pthread_kill.c:78
#2 0x0000ffffb502ed7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x0000ffffb501cd2c in __GI_abort () at abort.c:79
#4 0x0000ffffb50673ec in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0xffffb5141e20 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5 0x0000ffffb507d26c in malloc_printerr (str=str@entry=0xffffb513f0c8 "corrupted double-linked list")
at malloc.c:5671
#6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637
#7 0x0000ffffb507ec58 in _int_free (av=0xffffa0000030, p=0xffffa00012d0, have_lock=)
at malloc.c:4609
#8 0x0000ffffb5081370 in __GI___libc_free (mem=) at malloc.c:3393
#9 0x0000ffffb5081424 in tcache_thread_shutdown () at malloc.c:3229
#10 __malloc_arena_thread_freeres () at arena.c:1010
#11 0x0000ffffb50833e8 in __libc_thread_freeres () at thread-freeres.c:44
#12 0x0000ffffb50716a8 in start_thread (arg=0x0) at pthread_create.c:457
#13 0x0000ffffb50d7d5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
(gdb) f 6
#6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637
1637 malloc_printerr ("corrupted double-linked list");
(gdb) p p->fd
$8 = (struct malloc_chunk *) 0xffffa0001ff0
(gdb) p p->fd->fd
$9 = (struct malloc_chunk *) 0x0
(gdb) x/36xg 0xffffa0001ff0
0xffffa0001ff0: 0x0000000000000000 0x0000000000002c91
0xffffa0002000: 0x0000000000000000 0x0000000000000000
0xffffa0002010: 0x0000000000000000 0x0000000000000000
0xffffa0002020: 0x0000000000000000 0x0000000000000000
0xffffa0002030: 0x0000000000000000 0x0000000000000000
0xffffa0002040: 0x0000000000000000 0x0000000000000000
0xffffa0002050: 0x0000000000000000 0x0000000000000000
0xffffa0002060: 0x0000000000000000 0x0000000000000000
0xffffa0002070: 0x0000000000000000 0x0000000000000000
0xffffa0002080: 0x0000000000000000 0x0000000000000000
0xffffa0002090: 0x0000000000000000 0x0000000000000000
0xffffa00020a0: 0x0000000000000000 0x0000000000000000
0xffffa00020b0: 0x0000000000000000 0x0000000000000000
0xffffa00020c0: 0x0000000000000000 0x0000000000000000
0xffffa00020d0: 0x0000000000000000 0x0000000000000000
0xffffa00020e0: 0x0000000000000000 0x0000000000000000
0xffffa00020f0: 0x0000000000000000 0x0000000000000000
0xffffa0002100: 0x0000000000000000 0x0000000000000000
(gdb)

The logic of the test case is to detect whether the path is recovery after the path is set to marginal and the marginal_path_double_failed_time has elapsed, and then stop the multi-path guardian process.

When we examine the thread exit, the malloc_printerr function detects a linked list of memory that has been freed during the process's execution. It is found that this memory has been written with an abnormal value of 0, resulting an error. We don't know how to analyze this problem and we don't know where the memory is being written abnormally. Can you tell us how to analyze and solve this problem?

mwilck · 2023-09-14T15:00:29Z

malloc corruption is always difficult to debug. The easiest way is to run multipathd under valgrind, but that may be too slow to reproduce the error (just try).

If that doesn't work, you can try using the compiler flag -fsanitize=address, which may give you additional insight when the crash occurs.

mwilck · 2023-09-14T15:01:35Z

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

lixiaokeng · 2023-09-16T11:28:18Z

This cause when multipathd exit. According to LWP, I guess it was when uvloop exited and released resources that the content corresponding to the chunk->fd address was abnormal.

8912 bytes area （0xffffa0002000 to 0xffffa0003f28） is zero in p->fd.

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

lixiaokeng · 2023-09-16T11:30:02Z

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

There only cause once in 0.8.7. Now we use asan test but no progress yet

mwilck · 2023-09-18T07:23:24Z

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

It's possible if the memset would clear an area outside the allocated memory for p.

lixiaokeng · 2023-09-25T01:04:43Z

Here is some new info. This coredump cased when mutlipathd stop and after free_io_err_stat_path.

mwilck · 2023-09-25T09:09:44Z

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events?

lixiaokeng · 2023-09-25T10:42:27Z

multipath.conf:
defaults{
marginal_path_double_failed_time 30
marginal_path_err_sample_time 120
marginal_path_err_rate_threshold 10
marginal_path_err_recheck_gap_time 600
}

test.sh

while true
do
begin="$(date +'%Y-%m-%d %H:%M:%S')"
echo $begin
multipathd fail path /dev/sdc
sleep 0.5
multipathd reinstate path /dev/sdc
sleep 0.5
multipathd fail path /dev/sdc
sleep 0.5
multipathd reinstate path /dev/sdc
journalctl --since="${begin}" |grep "path is now marginal"
while [ $? -ne 0 ]
do
sleep 0.1
journalctl --since="${begin}" |grep "path is now marginal"
done
systemctl restart multipathd
sleep 3
done

This problem can be reproduced by this test.sh.

lixiaokeng · 2023-09-25T11:00:18Z

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Yes， I don't see any competition either.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events?
Yes，it only occur if there are actual marginal path events.

I add printing p->dio_ctx_array->buf in free_io_err_stat_path before and after free().
I find the buf is freed and the content is malloc_chunk->fd. But buf is zero in coredump.

lixiaokeng · 2023-09-27T08:10:23Z

I think the reason is that io_cancel fails and the buf is UAF by io in kernel.
Do you think it is a good idea move pthread_testcancel after io_getevents in process_async_ios_event?

mwilck · 2023-09-27T10:11:34Z

AFAIR, io_cancel() always fails; even if the request can be cancelled (which is almost never the case), it returns -EINPROGRESS. It's possible for a user space application to drop the aio buffers before the aio finishes. In that case, anything can happen; the kernel will write to free()d memory, overwriting malloc meta data.

IMO the bottom line is that you can't free aio memory unless the kernel before the kernel has completed them.

Wrt pthread_testcancel(), it might be a good idea to comment out calls to this function in the io_err_stat code altogether for testing.

mwilck · 2023-09-27T10:42:34Z

Please check the patch from this draft PR. It's against my latest code base, but you should be able to apply it on 0.8.7, too.

lixiaokeng · 2023-09-28T08:39:12Z

I review the code and I think it can solve the problem. But The National Day holiday is coming. I will test it after holiday.

Guanqinm · 2023-10-08T02:55:56Z

We have verified the patch from this draft PR in the environment and confirmed that the patch can solve the problem.

mwilck · 2023-10-09T15:22:35Z

Thanks, I'll make an official submission asap.

It is wrong to assume that aio data structures can be reused or freed after io_cancel(). io_cancel() will almost always return -EINPROGRESS, anyway. Use the io_starttime field to indicate whether an io event has been completed by the kernel. Make sure no in-flight buffers are freed. Fixes opensvc#73. Signed-off-by: Martin Wilck <[email protected]> Cc: Li Xiao Keng <[email protected]> Cc: Miao Guanqin <[email protected]>

mwilck · 2023-10-10T20:03:30Z

The directio checker has the same problem (even though we've seen no crashes there). I made a patch for it, too, but now the unit tests are failing :-/ I need to have another look.

mwilck · 2023-10-11T15:04:25Z

@Guanqinm, @lixiaokeng: I just realize that there might be a simpler solution to this. Please revert the previous patch and just try this instead:

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);
 
        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

The io_destroy(2) man page says:

"The io_destroy() system call will attempt to cancel all outstanding asynchronous I/O operations against ctx_id, will block on the completion of all operations that could not be canceled, and will destroy the ctx_id."

Thus after calling io_destroy(), it should be impossible that any outstanding I/O completes and overwrites application memory.

mwilck · 2023-10-11T20:13:54Z

Pulling in also @bmarzins, as he has done most of the aio work for the directio checker. I have experimented a bit.

it's definitely wrong to free any iocb resources after io_cancel(), even if it succeds (which will not with recent kernels, it always returns -EINPROGRESS).
the kernel can't cancel an iocb once submitted. So if the device doesn't respond, all we can do is wait for the request to eventually time out, and error handling to finish. The alternative is to just forget about the allocated memory and create a new request.
io_destroy() actually waits for all outstanding IO to finish. Memory can be freed after it returns. The downside of this is that io_destroy() may hang for a long time. Some time ago we have put considerable effort in an attempt to make multipathd shut down quickly; if io_destroy() hangs, this won't work any more. OTOH, if we don't call io_destroy(), the kernel will do it when our main process exits, but by that time we'd have released all our other resources (in particular the unix socket), and a new multipathd instance could be started, which would likely not work if io_destroy() was hung before the socket was freed.
The directio checker calls io_destroy() during runtime if a certain number of requests have stalled, which may cause the checker to block for a considerable amount of time. This kind of contradicts the purpose of using aio for the checker.

Guanqinm · 2023-10-16T01:53:29Z

We have verified the patch and confirmed that the patch can solve this issue. @mwilck

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);
 
        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

mwilck · 2023-10-16T08:02:17Z

Thanks! The trivial patch alone is not correct by itself because the code would still re-use iocbs before they have completed. But it clarifies things.

Guanqinm · 2023-10-16T09:14:18Z

Hello, may I ask if the repair patch is the same as the previous patch (this draft PR)？

mwilck · 2023-10-16T13:37:57Z

I haven't finished this yet. For now you can use the "draft PR", or a combination of both patches.

It is wrong to assume that aio data structures can be reused or freed after io_cancel(). io_cancel() will almost always return -EINPROGRESS, anyway. Use the io_starttime field to indicate whether an io event has been completed by the kernel. Make sure no in-flight buffers are freed. Fixes opensvc#73. Signed-off-by: Martin Wilck <[email protected]> Cc: Li Xiao Keng <[email protected]> Cc: Miao Guanqin <[email protected]>

mwilck · 2023-10-24T16:50:22Z

I can't finish my bigger project right now. I have sent a small patch set based on the patches from this issue to dm-devel.

lixiaokeng · 2023-10-25T10:09:46Z

libmultipath: reduce log level of directio messages
libmultipath: directio: don't reset ct->running after io_cancel()
libmultipath: io_err_stat: don't free aio memory before completion
libmultipath: io_err_stat: call io_destroy() before
free_io_err_pathvec()
multipath-tools: systemd: require modprobe@dm_multipath.service
libmpathutil: remove systemd_service_enabled()

Are these patches intended to fix this issue?

mwilck · 2023-10-25T10:44:39Z

Yes. It's the same patches that you verified here.
I just added the directio patches which fix the same issue for the directio checker.

mwilck · 2023-10-25T10:45:08Z

You're of course very welcome to test that patch set in your environment.

It is wrong to assume that aio data structures can be reused or freed after io_cancel(). io_cancel() will almost always return -EINPROGRESS, anyway. Use the io_starttime field to indicate whether an io event has been completed by the kernel. Make sure no in-flight buffers are freed. Fixes opensvc#73. Signed-off-by: Martin Wilck <[email protected]> Cc: Li Xiao Keng <[email protected]> Cc: Miao Guanqin <[email protected]> Cc: Guan Junxiong <[email protected]>

mwilck · 2023-10-27T15:45:55Z

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

lixiaokeng · 2023-10-28T08:26:29Z

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

I have give a test.sh. Do you have test it？We have test "libmultipath: io_err_stat: don't free aio memory before completion" and "libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec()". There is no coredump. I think our test can't test other patches.

It is wrong to assume that aio data structures can be reused or freed after io_cancel(). io_cancel() will almost always return -EINPROGRESS, anyway. Use the io_starttime field to indicate whether an io event has been completed by the kernel. Make sure no in-flight buffers are freed. Fixes opensvc#73. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]> Cc: Li Xiao Keng <[email protected]> Cc: Miao Guanqin <[email protected]> Cc: Guan Junxiong <[email protected]>

mwilck mentioned this issue Nov 17, 2023

multipath-tools 0.9.7 #77

Merged

cvaroqui closed this as completed in #77 Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

Guanqinm commented Sep 14, 2023 •

edited

Loading

mwilck commented Sep 14, 2023

mwilck commented Sep 14, 2023

lixiaokeng commented Sep 16, 2023 •

edited

Loading

lixiaokeng commented Sep 16, 2023

mwilck commented Sep 18, 2023

lixiaokeng commented Sep 25, 2023

mwilck commented Sep 25, 2023

lixiaokeng commented Sep 25, 2023 •

edited

Loading

lixiaokeng commented Sep 25, 2023 •

edited

Loading

lixiaokeng commented Sep 27, 2023

mwilck commented Sep 27, 2023

mwilck commented Sep 27, 2023

lixiaokeng commented Sep 28, 2023

Guanqinm commented Oct 8, 2023

mwilck commented Oct 9, 2023

mwilck commented Oct 10, 2023

mwilck commented Oct 11, 2023

mwilck commented Oct 11, 2023

Guanqinm commented Oct 16, 2023 •

edited

Loading

mwilck commented Oct 16, 2023

Guanqinm commented Oct 16, 2023

mwilck commented Oct 16, 2023

mwilck commented Oct 24, 2023

lixiaokeng commented Oct 25, 2023

mwilck commented Oct 25, 2023

mwilck commented Oct 25, 2023

mwilck commented Oct 27, 2023

lixiaokeng commented Oct 28, 2023 •

edited

Loading

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

Comments

Guanqinm commented Sep 14, 2023 • edited Loading

mwilck commented Sep 14, 2023

mwilck commented Sep 14, 2023

lixiaokeng commented Sep 16, 2023 • edited Loading

lixiaokeng commented Sep 16, 2023

mwilck commented Sep 18, 2023

lixiaokeng commented Sep 25, 2023

mwilck commented Sep 25, 2023

lixiaokeng commented Sep 25, 2023 • edited Loading

lixiaokeng commented Sep 25, 2023 • edited Loading

lixiaokeng commented Sep 27, 2023

mwilck commented Sep 27, 2023

mwilck commented Sep 27, 2023

lixiaokeng commented Sep 28, 2023

Guanqinm commented Oct 8, 2023

mwilck commented Oct 9, 2023

mwilck commented Oct 10, 2023

mwilck commented Oct 11, 2023

mwilck commented Oct 11, 2023

Guanqinm commented Oct 16, 2023 • edited Loading

mwilck commented Oct 16, 2023

Guanqinm commented Oct 16, 2023

mwilck commented Oct 16, 2023

mwilck commented Oct 24, 2023

lixiaokeng commented Oct 25, 2023

mwilck commented Oct 25, 2023

mwilck commented Oct 25, 2023

mwilck commented Oct 27, 2023

lixiaokeng commented Oct 28, 2023 • edited Loading

Guanqinm commented Sep 14, 2023 •

edited

Loading

lixiaokeng commented Sep 16, 2023 •

edited

Loading

lixiaokeng commented Sep 25, 2023 •

edited

Loading

lixiaokeng commented Sep 25, 2023 •

edited

Loading

Guanqinm commented Oct 16, 2023 •

edited

Loading

lixiaokeng commented Oct 28, 2023 •

edited

Loading