Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

Closed
Guanqinm opened this issue Sep 14, 2023 · 28 comments · Fixed by #77
Closed

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

Guanqinm opened this issue Sep 14, 2023 · 28 comments · Fixed by #77

Comments

@Guanqinm
Copy link

Guanqinm commented Sep 14, 2023

Hi
Here we met a question. When we were testing the marginal parameters, a coredump was generated, which is shown below:

(gdb) bt
#0 __pthread_kill_implementation (threadid=281473695391360, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
#1 0x0000ffffb5073304 in __pthread_kill_internal (signo=, threadid=)
at pthread_kill.c:78
#2 0x0000ffffb502ed7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x0000ffffb501cd2c in __GI_abort () at abort.c:79
#4 0x0000ffffb50673ec in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0xffffb5141e20 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5 0x0000ffffb507d26c in malloc_printerr (str=str@entry=0xffffb513f0c8 "corrupted double-linked list")
at malloc.c:5671
#6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637
#7 0x0000ffffb507ec58 in _int_free (av=0xffffa0000030, p=0xffffa00012d0, have_lock=)
at malloc.c:4609
#8 0x0000ffffb5081370 in __GI___libc_free (mem=) at malloc.c:3393
#9 0x0000ffffb5081424 in tcache_thread_shutdown () at malloc.c:3229
#10 __malloc_arena_thread_freeres () at arena.c:1010
#11 0x0000ffffb50833e8 in __libc_thread_freeres () at thread-freeres.c:44
#12 0x0000ffffb50716a8 in start_thread (arg=0x0) at pthread_create.c:457
#13 0x0000ffffb50d7d5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
(gdb) f 6
#6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637
1637 malloc_printerr ("corrupted double-linked list");
(gdb) p p->fd
$8 = (struct malloc_chunk *) 0xffffa0001ff0
(gdb) p p->fd->fd
$9 = (struct malloc_chunk *) 0x0
(gdb) x/36xg 0xffffa0001ff0
0xffffa0001ff0: 0x0000000000000000 0x0000000000002c91
0xffffa0002000: 0x0000000000000000 0x0000000000000000
0xffffa0002010: 0x0000000000000000 0x0000000000000000
0xffffa0002020: 0x0000000000000000 0x0000000000000000
0xffffa0002030: 0x0000000000000000 0x0000000000000000
0xffffa0002040: 0x0000000000000000 0x0000000000000000
0xffffa0002050: 0x0000000000000000 0x0000000000000000
0xffffa0002060: 0x0000000000000000 0x0000000000000000
0xffffa0002070: 0x0000000000000000 0x0000000000000000
0xffffa0002080: 0x0000000000000000 0x0000000000000000
0xffffa0002090: 0x0000000000000000 0x0000000000000000
0xffffa00020a0: 0x0000000000000000 0x0000000000000000
0xffffa00020b0: 0x0000000000000000 0x0000000000000000
0xffffa00020c0: 0x0000000000000000 0x0000000000000000
0xffffa00020d0: 0x0000000000000000 0x0000000000000000
0xffffa00020e0: 0x0000000000000000 0x0000000000000000
0xffffa00020f0: 0x0000000000000000 0x0000000000000000
0xffffa0002100: 0x0000000000000000 0x0000000000000000
(gdb)

The logic of the test case is to detect whether the path is recovery after the path is set to marginal and the marginal_path_double_failed_time has elapsed, and then stop the multi-path guardian process.

When we examine the thread exit, the malloc_printerr function detects a linked list of memory that has been freed during the process's execution. It is found that this memory has been written with an abnormal value of 0, resulting an error. We don't know how to analyze this problem and we don't know where the memory is being written abnormally. Can you tell us how to analyze and solve this problem?

@mwilck
Copy link
Contributor

mwilck commented Sep 14, 2023

malloc corruption is always difficult to debug. The easiest way is to run multipathd under valgrind, but that may be too slow to reproduce the error (just try).

If that doesn't work, you can try using the compiler flag -fsanitize=address, which may give you additional insight when the crash occurs.

@mwilck
Copy link
Contributor

mwilck commented Sep 14, 2023

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

@lixiaokeng
Copy link
Contributor

lixiaokeng commented Sep 16, 2023

This cause when multipathd exit. According to LWP, I guess it was when uvloop exited and released resources that the content corresponding to the chunk->fd address was abnormal.
image

8912 bytes area (0xffffa0002000 to 0xffffa0003f28) is zero in p->fd.
image
image

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

@lixiaokeng
Copy link
Contributor

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

There only cause once in 0.8.7. Now we use asan test but no progress yet

@mwilck
Copy link
Contributor

mwilck commented Sep 18, 2023

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

It's possible if the memset would clear an area outside the allocated memory for p.

@lixiaokeng
Copy link
Contributor

Here is some new info. This coredump cased when mutlipathd stop and after free_io_err_stat_path.

@mwilck
Copy link
Contributor

mwilck commented Sep 25, 2023

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events?

@lixiaokeng
Copy link
Contributor

lixiaokeng commented Sep 25, 2023

multipath.conf:
defaults{
marginal_path_double_failed_time 30
marginal_path_err_sample_time 120
marginal_path_err_rate_threshold 10
marginal_path_err_recheck_gap_time 600
}

test.sh

while true
do
begin="$(date +'%Y-%m-%d %H:%M:%S')"
echo $begin
multipathd fail path /dev/sdc
sleep 0.5
multipathd reinstate path /dev/sdc
sleep 0.5
multipathd fail path /dev/sdc
sleep 0.5
multipathd reinstate path /dev/sdc
journalctl --since="${begin}" |grep "path is now marginal"
while [ $? -ne 0 ]
do
sleep 0.1
journalctl --since="${begin}" |grep "path is now marginal"
done
systemctl restart multipathd
sleep 3
done

This problem can be reproduced by this test.sh.

@lixiaokeng
Copy link
Contributor

lixiaokeng commented Sep 25, 2023

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Yes, I don't see any competition either.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events?
Yes,it only occur if there are actual marginal path events.

I add printing p->dio_ctx_array->buf in free_io_err_stat_path before and after free().
I find the buf is freed and the content is malloc_chunk->fd. But buf is zero in coredump.

@lixiaokeng
Copy link
Contributor

image

I think the reason is that io_cancel fails and the buf is UAF by io in kernel.
Do you think it is a good idea move pthread_testcancel after io_getevents in process_async_ios_event?

@mwilck
Copy link
Contributor

mwilck commented Sep 27, 2023

AFAIR, io_cancel() always fails; even if the request can be cancelled (which is almost never the case), it returns -EINPROGRESS. It's possible for a user space application to drop the aio buffers before the aio finishes. In that case, anything can happen; the kernel will write to free()d memory, overwriting malloc meta data.

IMO the bottom line is that you can't free aio memory unless the kernel before the kernel has completed them.

Wrt pthread_testcancel(), it might be a good idea to comment out calls to this function in the io_err_stat code altogether for testing.

@mwilck
Copy link
Contributor

mwilck commented Sep 27, 2023

Please check the patch from this draft PR. It's against my latest code base, but you should be able to apply it on 0.8.7, too.

@lixiaokeng
Copy link
Contributor

I review the code and I think it can solve the problem. But The National Day holiday is coming. I will test it after holiday.

@Guanqinm
Copy link
Author

Guanqinm commented Oct 8, 2023

We have verified the patch from this draft PR in the environment and confirmed that the patch can solve the problem.

@mwilck
Copy link
Contributor

mwilck commented Oct 9, 2023

Thanks, I'll make an official submission asap.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Oct 10, 2023
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Oct 10, 2023

The directio checker has the same problem (even though we've seen no crashes there). I made a patch for it, too, but now the unit tests are failing :-/ I need to have another look.

@mwilck
Copy link
Contributor

mwilck commented Oct 11, 2023

@Guanqinm, @lixiaokeng: I just realize that there might be a simpler solution to this. Please revert the previous patch and just try this instead:

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);
 
        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

The io_destroy(2) man page says:

"The io_destroy() system call will attempt to cancel all outstanding asynchronous I/O operations against ctx_id, will block on the completion of all operations that could not be canceled, and will destroy the ctx_id."

Thus after calling io_destroy(), it should be impossible that any outstanding I/O completes and overwrites application memory.

@mwilck
Copy link
Contributor

mwilck commented Oct 11, 2023

Pulling in also @bmarzins, as he has done most of the aio work for the directio checker. I have experimented a bit.

  • it's definitely wrong to free any iocb resources after io_cancel(), even if it succeds (which will not with recent kernels, it always returns -EINPROGRESS).
  • the kernel can't cancel an iocb once submitted. So if the device doesn't respond, all we can do is wait for the request to eventually time out, and error handling to finish. The alternative is to just forget about the allocated memory and create a new request.
  • io_destroy() actually waits for all outstanding IO to finish. Memory can be freed after it returns. The downside of this is that io_destroy() may hang for a long time. Some time ago we have put considerable effort in an attempt to make multipathd shut down quickly; if io_destroy() hangs, this won't work any more. OTOH, if we don't call io_destroy(), the kernel will do it when our main process exits, but by that time we'd have released all our other resources (in particular the unix socket), and a new multipathd instance could be started, which would likely not work if io_destroy() was hung before the socket was freed.
  • The directio checker calls io_destroy() during runtime if a certain number of requests have stalled, which may cause the checker to block for a considerable amount of time. This kind of contradicts the purpose of using aio for the checker.

@Guanqinm
Copy link
Author

Guanqinm commented Oct 16, 2023

We have verified the patch and confirmed that the patch can solve this issue. @mwilck

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);
 
        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

@mwilck
Copy link
Contributor

mwilck commented Oct 16, 2023

Thanks! The trivial patch alone is not correct by itself because the code would still re-use iocbs before they have completed. But it clarifies things.

@Guanqinm
Copy link
Author

Hello, may I ask if the repair patch is the same as the previous patch (this draft PR)?

@mwilck
Copy link
Contributor

mwilck commented Oct 16, 2023

I haven't finished this yet. For now you can use the "draft PR", or a combination of both patches.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Oct 24, 2023
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Oct 24, 2023

I can't finish my bigger project right now. I have sent a small patch set based on the patches from this issue to dm-devel.

@lixiaokeng
Copy link
Contributor

libmultipath: reduce log level of directio messages
libmultipath: directio: don't reset ct->running after io_cancel()
libmultipath: io_err_stat: don't free aio memory before completion
libmultipath: io_err_stat: call io_destroy() before
free_io_err_pathvec()
multipath-tools: systemd: require modprobe@dm_multipath.service
libmpathutil: remove systemd_service_enabled()

Are these patches intended to fix this issue?

@mwilck
Copy link
Contributor

mwilck commented Oct 25, 2023

Yes. It's the same patches that you verified here.
I just added the directio patches which fix the same issue for the directio checker.

@mwilck
Copy link
Contributor

mwilck commented Oct 25, 2023

You're of course very welcome to test that patch set in your environment.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Oct 26, 2023
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
Cc: Guan Junxiong <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Oct 27, 2023

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

@lixiaokeng
Copy link
Contributor

lixiaokeng commented Oct 28, 2023

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

I have give a test.sh. Do you have test it?We have test "libmultipath: io_err_stat: don't free aio memory before completion" and "libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec()". There is no coredump. I think our test can't test other patches.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Oct 30, 2023
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
Cc: Guan Junxiong <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Feb 27, 2024
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
Cc: Guan Junxiong <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Feb 27, 2024
It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes opensvc#73.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Cc: Li Xiao Keng <[email protected]>
Cc: Miao Guanqin <[email protected]>
Cc: Guan Junxiong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants