UML #21

asamy · 2012-08-24T02:44:33Z

No description provided.

…d reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

When hot-adding a CPU, the system outputs following messages since node_to_cpumask_map[2] was not allocated memory. Booting Node 2 Processor 32 APIC 0xc0 node_to_cpumask_map[2] NULL Pid: 0, comm: swapper/32 Tainted: G A 3.3.5-acd torvalds#21 Call Trace: [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e [<ffffffff81505263>] smp_callin+0x139/0x1be [<ffffffff815052fb>] start_secondary+0x13/0xeb The reason is that the bit of node 2 was not set at numa_nodes_parsed. numa_nodes_parsed is set by only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init. Thus even if hot-added memory which is same PXM as hot-added CPU is written in ACPI SRAT Table, if the hot-added CPU is not written in ACPI SRAT table, numa_nodes_parsed is not set. But according to ACPI Spec Rev 5.0, it says about ACPI SRAT table as follows: This optional table provides information that allows OSPM to associate processors and memory ranges, including ranges of memory provided by hot-added memory devices, with system localities / proximity domains and clock domains. It means that ACPI SRAT table only provides information for CPUs present at boot time and for memory including hot-added memory. So hot-added memory is written in ACPI SRAT table, but hot-added CPU is not written in it. Thus numa_nodes_parsed should be set by not only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init but also acpi_numa_memory_affinity_init for the case. Additionally, if system has cpuless memory node, acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds since these functions cannot find cpu description for the node. In this case, numa_nodes_parsed needs to be set by acpi_numa_memory_affinity_init. Signed-off-by: Yasuaki Ishimatsu <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ merged it ] Signed-off-by: Ingo Molnar <[email protected]>

The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU torvalds#6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors torvalds#6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] torvalds#7 torvalds#8 torvalds#9 torvalds#10 torvalds#11 Ok. [ 0.807108] Booting Node 3, Processors torvalds#12 torvalds#13 torvalds#14 torvalds#15 torvalds#16 torvalds#17 Ok. [ 0.897587] Booting Node 2, Processors torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <[email protected]> Cc: Andreas Herrmann <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton at redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust at netapp.com> Signed-off-by: Ben Hutchings <ben at decadent.org.uk>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] torvalds#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 torvalds#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

Fixes issue torvalds#21 on amery/linux-allwinner

Fix build using O= (issue torvalds#21) and inline build on CM9

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

…rq() workaround for clockevent Timer request_irq() for TIMER0 failing on CPU1 Ideally we want to use the request_percpu_irq( ) / enable_percpu_irq() calls from GENERIC_IRQ framework, however that seems to be faltering even on the boot cpu at the time of first interrupt. Until that is resolved (with Thomas G), we need to pretend that TIMER0 is IRQF_SHARED. This also requires yet another hack of explicitly unmasking the IRQ on that CPU. Query sent to Thomas Gleixner ======================>8==================================== In a SMP setup, each ARC700 CPU has a in-core TIMER, hooked up to private IRQ 3 of respective CPU and would serve as the local clock_event_device. request_irq( ) for my first CPU which succeeds, looks roughly as follows: void __cpuinit arc_clockevent_init(void) { int rc; unsigned int cpu = smp_processor_id(); struct clock_event_device *evt = &per_cpu(arc_clockevent_device, cpu); .... rc = request_irq(TIMER0_INT, timer_irq_handler, IRQF_TIMER | IRQF_DISABLED | IRQF_PERCPU, "Timer0 (clock-evt-dev)", evt); .... The exact same call, when done from 2nd CPU fails, as it wants to see IRQF_SHARED which is semantically not correct, since IRQ is not really shared, it is a private instance (albeit same value), per cpu. I figured that the right APIs for our case is the pair: (request|enable)_percpu_irq to be called for both CPUs, with a prior one time call to irq_set_percpu_devid(). Is that correct? Assuming it is, the trouble now is that, even on the first CPU, handle_level_irq( ) is bailing out w/o calling handle_irq_event() because irqd_irq_disabled( ) is true. This in turn happens because, irq_set_percpu_devid(), our much needed init routine, sets IRQ_NOAUTOEN causing __setup_irq( ) to skip calling irq_startup() => irq_enable() which would have cleared IRQD_IRQ_DISABLED. While enable_percpu_irq( ), could have fixed this, it only seems to be unmasking IRQ at device level, it is not clearing the above flag. I tried calling enable_irq( ) right after, but that doesn't seem to help either. What API am I missing here, to enable the irqd machinery, or am I seeing a bug where enable_percpu_irq( ) call-chain should somehow be doing it. ======================>8==================================== This needs to be reverted and replaced with right calls once ThomasG responds to my query. Signed-off-by: Vineet Gupta <[email protected]>

…equest_irq() workaround for clockevent Timer" This reverts commit 2985184. Next commit uses the correct APIs, so we no longer need this hack

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... torvalds#31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok. Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok. Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 Brought up 32 CPUs Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

commit a3f83ab upstream. At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>

request_irq() for TIMER0 failing on CPU1 Ideally we want to use the request_percpu_irq( ) / enable_percpu_irq() calls from GENERIC_IRQ framework, however that seems to be faltering even on the boot cpu at the time of first interrupt. Until that is resolved (with Thomas G), we need to pretend that TIMER0 is IRQF_SHARED. This also requires yet another hack of explicitly unmasking the IRQ on that CPU. Query sent to Thomas Gleixner ======================>8==================================== In a SMP setup, each ARC700 CPU has a in-core TIMER, hooked up to private IRQ 3 of respective CPU and would serve as the local clock_event_device. request_irq( ) for my first CPU which succeeds, looks roughly as follows: void __cpuinit arc_clockevent_init(void) { int rc; unsigned int cpu = smp_processor_id(); struct clock_event_device *evt = &per_cpu(arc_clockevent_device, cpu); .... rc = request_irq(TIMER0_INT, timer_irq_handler, IRQF_TIMER | IRQF_DISABLED | IRQF_PERCPU, "Timer0 (clock-evt-dev)", evt); .... The exact same call, when done from 2nd CPU fails, as it wants to see IRQF_SHARED which is semantically not correct, since IRQ is not really shared, it is a private instance (albeit same value), per cpu. I figured that the right APIs for our case is the pair: (request|enable)_percpu_irq to be called for both CPUs, with a prior one time call to irq_set_percpu_devid(). Is that correct? Assuming it is, the trouble now is that, even on the first CPU, handle_level_irq( ) is bailing out w/o calling handle_irq_event() because irqd_irq_disabled( ) is true. This in turn happens because, irq_set_percpu_devid(), our much needed init routine, sets IRQ_NOAUTOEN causing __setup_irq( ) to skip calling irq_startup() => irq_enable() which would have cleared IRQD_IRQ_DISABLED. While enable_percpu_irq( ), could have fixed this, it only seems to be unmasking IRQ at device level, it is not clearing the above flag. I tried calling enable_irq( ) right after, but that doesn't seem to help either. What API am I missing here, to enable the irqd machinery, or am I seeing a bug where enable_percpu_irq( ) call-chain should somehow be doing it. ======================>8==================================== This needs to be reverted and replaced with right calls once ThomasG responds to my query. Signed-off-by: Vineet Gupta <[email protected]>

…nt Timer" This reverts commit 2985184. Next commit uses the correct APIs, so we no longer need this hack Signed-off-by: Vineet Gupta <[email protected]>

At a boot time I observed following bug: BUG: unable to handle kernel paging request at ffff8800a4244000 IP: [<ffffffff81275b5b>] memcpy+0xb/0x120 PGD 1816063 PUD 1fe7d067 PMD 1ff9f067 PTE 80000000a4244160 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU 0 Modules linked in: btusb bluetooth brcmsmac brcmutil crc8 cordic b43 radeon(+) mac80211 cfg80211 ttm ohci_hcd drm_kms_helper rfkill drm ssb agpgart mmc_core sp5100_tco video battery ac thermal processor rtc_cmos thermal_sys snd_hda_codec_hdmi joydev snd_hda_codec_conexant button bcma pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm shpchp pcmcia_core k8temp snd_timer atl1c snd psmouse hwmon i2c_piix4 i2c_algo_bit soundcore evdev i2c_core ehci_hcd sg serio_raw snd_page_alloc loop btrfs Pid: 1008, comm: modprobe Not tainted 3.3.0-rc1 torvalds#21 LENOVO 20046 /AMD CRB RIP: 0010:[<ffffffff81275b5b>] [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP: 0018:ffff8800aa72db00 EFLAGS: 00010246 RAX: ffff8800a4150000 RBX: 0000000000001000 RCX: 0000000000000087 RDX: 0000000000000000 RSI: ffff8800a4244000 RDI: ffff8800a4150bc8 RBP: ffff8800aa72db78 R08: 0000000000000010 R09: ffffffff8174bbec R10: ffffffff812ee010 R11: 0000000000000001 R12: 0000000000001000 R13: 0000000000010000 R14: ffff8800a4140000 R15: ffff8800aaba1800 FS: 00007ff9a3bd4720(0000) GS:ffff8800afa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8800a4244000 CR3: 00000000a9c18000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1008, threadinfo ffff8800aa72c000, task ffff8800aa0e4000) Stack: ffffffffa04e7c7b 0000000000000001 0000000000010000 ffff8800aa72db28 ffffffff00000001 0000000000001000 ffffffff8113cbef 0000000000000020 ffff8800a4243420 ffff880000000002 ffff8800aa72db08 ffff8800a9d42000 Call Trace: [<ffffffffa04e7c7b>] ? radeon_atrm_get_bios_chunk+0x8b/0xd0 [radeon] [<ffffffff8113cbef>] ? kmalloc_order_trace+0x3f/0xb0 [<ffffffffa04a9298>] radeon_get_bios+0x68/0x2f0 [radeon] [<ffffffffa04c7a30>] rv770_init+0x40/0x280 [radeon] [<ffffffffa047d740>] radeon_device_init+0x560/0x600 [radeon] [<ffffffffa047ef4f>] radeon_driver_load_kms+0xaf/0x170 [radeon] [<ffffffffa043cdde>] drm_get_pci_dev+0x18e/0x2c0 [drm] [<ffffffffa04e7e95>] radeon_pci_probe+0xad/0xb5 [radeon] [<ffffffff81296c5f>] local_pci_probe+0x5f/0xd0 [<ffffffff81297418>] pci_device_probe+0x88/0xb0 [<ffffffff813417aa>] ? driver_sysfs_add+0x7a/0xb0 [<ffffffff813418d8>] really_probe+0x68/0x180 [<ffffffff81341be5>] driver_probe_device+0x45/0x70 [<ffffffff81341cb3>] __driver_attach+0xa3/0xb0 [<ffffffff81341c10>] ? driver_probe_device+0x70/0x70 [<ffffffff813400ce>] bus_for_each_dev+0x5e/0x90 [<ffffffff8134172e>] driver_attach+0x1e/0x20 [<ffffffff81341298>] bus_add_driver+0xc8/0x280 [<ffffffff813422c6>] driver_register+0x76/0x140 [<ffffffff812976d6>] __pci_register_driver+0x66/0xe0 [<ffffffffa043d021>] drm_pci_init+0x111/0x120 [drm] [<ffffffff8133c67a>] ? vga_switcheroo_register_handler+0x3a/0x60 [<ffffffffa0229000>] ? 0xffffffffa0228fff [<ffffffffa02290ec>] radeon_init+0xec/0xee [radeon] [<ffffffff810002f2>] do_one_initcall+0x42/0x180 [<ffffffff8109d8d2>] sys_init_module+0x92/0x1e0 [<ffffffff815407a9>] system_call_fastpath+0x16/0x1b Code: 58 2a 43 50 88 43 4e 48 83 c4 08 5b c9 c3 66 90 e8 cb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff81275b5b>] memcpy+0xb/0x120 RSP <ffff8800aa72db00> CR2: ffff8800a4244000 ---[ end trace fcffa1599cf56382 ]--- Call to acpi_evaluate_object() not always returns 4096 bytes chunks, on my system it can return 2048 bytes chunk, so pass the length of retrieved chunk to memcpy(), not the length of the recieving buffer. Signed-off-by: Igor Murzov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Dave Airlie <[email protected]>

If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>

commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] torvalds#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] torvalds#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] torvalds#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] torvalds#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] torvalds#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] torvalds#11 dio_complete at ffffffff8c2b9fa7 torvalds#12 do_blockdev_direct_IO at ffffffff8c2bc09f torvalds#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] torvalds#14 generic_file_direct_write at ffffffff8c1dcf14 torvalds#15 __generic_file_write_iter at ffffffff8c1dd07b torvalds#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] torvalds#17 aio_write at ffffffff8c2cc72e torvalds#18 kmem_cache_alloc at ffffffff8c248dde torvalds#19 do_io_submit at ffffffff8c2ccada torvalds#20 do_syscall_64 at ffffffff8c004984 torvalds#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

In the buffered write path, the dirty page owns the qgroup rsv until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: [ 1903.401193] BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure [ 1903.402686] BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure [ 1903.446415] BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 [ 1903.447887] ------------[ cut here ]------------ [ 1903.448645] WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] [ 1903.450130] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq [ 1903.451408] CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 torvalds#21 [ 1903.453058] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 1903.454542] RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] [ 1903.455417] Code: 4d c0 48 c7 c6 a0 92 4d c0 48 c7 c7 78 82 4d c0 e8 63 22 36 d7 90 0f 0b f0 80 4b 10 02 48 89 df e8 33 dc fb ff 84 c0 74 13 90 <0f> 0b 90 48 c7 c6 c8 92 4d c0 48 89 df e8 0c 22 01 00 48 89 df e8 [ 1903.458317] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 [ 1903.459159] RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 [ 1903.460286] RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 [ 1903.461408] RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 [ 1903.462555] R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c [ 1903.463679] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1903.464803] FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 [ 1903.466082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1903.467004] CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 [ 1903.468124] Call Trace: [ 1903.468548] <TASK> [ 1903.468890] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.469689] ? __warn.cold+0x8e/0xea [ 1903.470260] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.471052] ? report_bug+0xff/0x140 [ 1903.471646] ? handle_bug+0x3b/0x70 [ 1903.472212] ? exc_invalid_op+0x17/0x70 [ 1903.472838] ? asm_exc_invalid_op+0x1a/0x20 [ 1903.473518] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.474283] generic_shutdown_super+0x70/0x160 [ 1903.475005] kill_anon_super+0x11/0x40 [ 1903.475630] btrfs_kill_super+0x11/0x20 [btrfs] [ 1903.476405] deactivate_locked_super+0x2e/0xa0 [ 1903.477125] cleanup_mnt+0xb5/0x150 [ 1903.477699] task_work_run+0x57/0x80 [ 1903.478267] syscall_exit_to_user_mode+0x121/0x130 [ 1903.479056] do_syscall_64+0xab/0x1a0 [ 1903.479658] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1903.480467] RIP: 0033:0x7f916847a887 [ 1903.481034] Code: 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 71 25 0d 00 f7 d8 64 89 02 b8 [ 1903.483951] RSP: 002b:00007ffe035d1648 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 1903.485153] RAX: 0000000000000000 RBX: 000056074eba0508 RCX: 00007f916847a887 [ 1903.486244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000056074eba0810 [ 1903.487128] RBP: 0000000000000000 R08: 00007ffe035d03f0 R09: 0000000000000001 [ 1903.488010] R10: 0000000000000103 R11: 0000000000000246 R12: 00007f91685cc22c [ 1903.488905] R13: 000056074eba0810 R14: 0000000000000000 R15: 000056074eba0400 [ 1903.489792] </TASK> [ 1903.490071] ---[ end trace 0000000000000000 ]--- [ 1903.490657] BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Signed-off-by: Boris Burkov <[email protected]>

In the buffered write path, the dirty page owns the qgroup rsv until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: [ 1903.401193] BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure [ 1903.402686] BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure [ 1903.446415] BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 [ 1903.447887] ------------[ cut here ]------------ [ 1903.448645] WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] [ 1903.450130] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq [ 1903.451408] CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 torvalds#21 [ 1903.453058] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 1903.454542] RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] [ 1903.455417] Code: 4d c0 48 c7 c6 a0 92 4d c0 48 c7 c7 78 82 4d c0 e8 63 22 36 d7 90 0f 0b f0 80 4b 10 02 48 89 df e8 33 dc fb ff 84 c0 74 13 90 <0f> 0b 90 48 c7 c6 c8 92 4d c0 48 89 df e8 0c 22 01 00 48 89 df e8 [ 1903.458317] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 [ 1903.459159] RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 [ 1903.460286] RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 [ 1903.461408] RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 [ 1903.462555] R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c [ 1903.463679] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1903.464803] FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 [ 1903.466082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1903.467004] CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 [ 1903.468124] Call Trace: [ 1903.468548] <TASK> [ 1903.468890] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.469689] ? __warn.cold+0x8e/0xea [ 1903.470260] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.471052] ? report_bug+0xff/0x140 [ 1903.471646] ? handle_bug+0x3b/0x70 [ 1903.472212] ? exc_invalid_op+0x17/0x70 [ 1903.472838] ? asm_exc_invalid_op+0x1a/0x20 [ 1903.473518] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.474283] generic_shutdown_super+0x70/0x160 [ 1903.475005] kill_anon_super+0x11/0x40 [ 1903.475630] btrfs_kill_super+0x11/0x20 [btrfs] [ 1903.476405] deactivate_locked_super+0x2e/0xa0 [ 1903.477125] cleanup_mnt+0xb5/0x150 [ 1903.477699] task_work_run+0x57/0x80 [ 1903.478267] syscall_exit_to_user_mode+0x121/0x130 [ 1903.479056] do_syscall_64+0xab/0x1a0 [ 1903.479658] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1903.480467] RIP: 0033:0x7f916847a887 [ 1903.481034] Code: 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 71 25 0d 00 f7 d8 64 89 02 b8 [ 1903.483951] RSP: 002b:00007ffe035d1648 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 1903.485153] RAX: 0000000000000000 RBX: 000056074eba0508 RCX: 00007f916847a887 [ 1903.486244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000056074eba0810 [ 1903.487128] RBP: 0000000000000000 R08: 00007ffe035d03f0 R09: 0000000000000001 [ 1903.488010] R10: 0000000000000103 R11: 0000000000000246 R12: 00007f91685cc22c [ 1903.488905] R13: 000056074eba0810 R14: 0000000000000000 R15: 000056074eba0400 [ 1903.489792] </TASK> [ 1903.490071] ---[ end trace 0000000000000000 ]--- [ 1903.490657] BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>

In the buffered write path, the dirty page owns the qgroup rsv until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: [ 1903.401193] BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure [ 1903.402686] BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure [ 1903.446415] BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 [ 1903.447887] ------------[ cut here ]------------ [ 1903.448645] WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] [ 1903.450130] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq [ 1903.451408] CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 torvalds#21 [ 1903.453058] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 1903.454542] RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] [ 1903.455417] Code: 4d c0 48 c7 c6 a0 92 4d c0 48 c7 c7 78 82 4d c0 e8 63 22 36 d7 90 0f 0b f0 80 4b 10 02 48 89 df e8 33 dc fb ff 84 c0 74 13 90 <0f> 0b 90 48 c7 c6 c8 92 4d c0 48 89 df e8 0c 22 01 00 48 89 df e8 [ 1903.458317] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 [ 1903.459159] RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 [ 1903.460286] RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 [ 1903.461408] RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 [ 1903.462555] R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c [ 1903.463679] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1903.464803] FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 [ 1903.466082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1903.467004] CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 [ 1903.468124] Call Trace: [ 1903.468548] <TASK> [ 1903.468890] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.469689] ? __warn.cold+0x8e/0xea [ 1903.470260] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.471052] ? report_bug+0xff/0x140 [ 1903.471646] ? handle_bug+0x3b/0x70 [ 1903.472212] ? exc_invalid_op+0x17/0x70 [ 1903.472838] ? asm_exc_invalid_op+0x1a/0x20 [ 1903.473518] ? close_ctree+0x222/0x4d0 [btrfs] [ 1903.474283] generic_shutdown_super+0x70/0x160 [ 1903.475005] kill_anon_super+0x11/0x40 [ 1903.475630] btrfs_kill_super+0x11/0x20 [btrfs] [ 1903.476405] deactivate_locked_super+0x2e/0xa0 [ 1903.477125] cleanup_mnt+0xb5/0x150 [ 1903.477699] task_work_run+0x57/0x80 [ 1903.478267] syscall_exit_to_user_mode+0x121/0x130 [ 1903.479056] do_syscall_64+0xab/0x1a0 [ 1903.479658] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1903.480467] RIP: 0033:0x7f916847a887 [ 1903.481034] Code: 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 71 25 0d 00 f7 d8 64 89 02 b8 [ 1903.483951] RSP: 002b:00007ffe035d1648 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 1903.485153] RAX: 0000000000000000 RBX: 000056074eba0508 RCX: 00007f916847a887 [ 1903.486244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000056074eba0810 [ 1903.487128] RBP: 0000000000000000 R08: 00007ffe035d03f0 R09: 0000000000000001 [ 1903.488010] R10: 0000000000000103 R11: 0000000000000246 R12: 00007f91685cc22c [ 1903.488905] R13: 000056074eba0810 R14: 0000000000000000 R15: 000056074eba0400 [ 1903.489792] </TASK> [ 1903.490071] ---[ end trace 0000000000000000 ]--- [ 1903.490657] BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]>

In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 torvalds#21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? close_ctree+0x222/0x4d0 [btrfs] ? __warn.cold+0x8e/0xea ? close_ctree+0x222/0x4d0 [btrfs] ? report_bug+0xff/0x140 ? handle_bug+0x3b/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? close_ctree+0x222/0x4d0 [btrfs] generic_shutdown_super+0x70/0x160 kill_anon_super+0x11/0x40 btrfs_kill_super+0x11/0x20 [btrfs] deactivate_locked_super+0x2e/0xa0 cleanup_mnt+0xb5/0x150 task_work_run+0x57/0x80 syscall_exit_to_user_mode+0x121/0x130 do_syscall_64+0xab/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>

The dead lock can happen if we try to use printk(), such as a call of SCHED_WARN_ON(), during the rq->__lock is held. The printk() will try to print the message to the console, and the console driver can call queue_work_on(), which will try to obtain rq->__lock again. This means that any WARN during the kernel function that hold the rq->__lock, such as schedule(), sched_ttwu_pending(), etc, can cause dead lock. Following is the call trace of the deadlock case that I encounter: PID: 0 TASK: ff36bfda010c8000 CPU: 156 COMMAND: "swapper/156" #0 crash_nmi_callback+30 #1 nmi_handle+85 #2 default_do_nmi+66 #3 exc_nmi+291 #4 end_repeat_nmi+22 [exception RIP: native_queued_spin_lock_slowpath+96] #5 native_queued_spin_lock_slowpath+96 torvalds#6 _raw_spin_lock+30 torvalds#7 ttwu_queue+111 torvalds#8 try_to_wake_up+375 torvalds#9 __queue_work+462 torvalds#10 queue_work_on+32 torvalds#11 soft_cursor+420 torvalds#12 bit_cursor+898 torvalds#13 hide_cursor+39 torvalds#14 vt_console_print+995 torvalds#15 call_console_drivers.constprop.0+204 torvalds#16 console_unlock+374 torvalds#17 vprintk_emit+280 torvalds#18 printk+88 torvalds#19 __warn_printk+71 torvalds#20 enqueue_task_fair+1779 torvalds#21 activate_task+102 torvalds#22 ttwu_do_activate+155 torvalds#23 sched_ttwu_pending+177 torvalds#24 flush_smp_call_function_from_idle+42 torvalds#25 do_idle+161 torvalds#26 cpu_startup_entry+25 torvalds#27 secondary_startup_64_no_verify+194 Fix this by using __printk_safe_enter()/__printk_safe_exit() in rq_pin_lock()/rq_unpin_lock(). Then, printk will defer to print out the buffers to the console. Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: Bin Lai <[email protected]>

iter_finish_branch_entry doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 torvalds#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 torvalds#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 torvalds#8 0x564d3403ba52 in do_flush util/ordered-events.c:245 torvalds#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 torvalds#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 torvalds#11 0x564d34032480 in perf_session__process_event util/session.c:1877 torvalds#12 0x564d340336ad in reader__read_event util/session.c:2399 torvalds#13 0x564d34033fdc in reader__process_events util/session.c:2448 torvalds#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 torvalds#15 0x564d34033fdc in perf_session__process_events util/session.c:2661 torvalds#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 torvalds#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 torvalds#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 torvalds#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 torvalds#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 torvalds#21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Signed-off-by: Ian Rogers <[email protected]>

iter_finish_branch_entry() doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 torvalds#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 torvalds#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 torvalds#8 0x564d3403ba52 in do_flush util/ordered-events.c:245 torvalds#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 torvalds#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 torvalds#11 0x564d34032480 in perf_session__process_event util/session.c:1877 torvalds#12 0x564d340336ad in reader__read_event util/session.c:2399 torvalds#13 0x564d34033fdc in reader__process_events util/session.c:2448 torvalds#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 torvalds#15 0x564d34033fdc in perf_session__process_events util/session.c:2661 torvalds#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 torvalds#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 torvalds#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 torvalds#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 torvalds#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 torvalds#21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

iter_finish_branch_entry() doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 torvalds#6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 torvalds#7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 torvalds#8 0x564d3403ba52 in do_flush util/ordered-events.c:245 torvalds#9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 torvalds#10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 torvalds#11 0x564d34032480 in perf_session__process_event util/session.c:1877 torvalds#12 0x564d340336ad in reader__read_event util/session.c:2399 torvalds#13 0x564d34033fdc in reader__process_events util/session.c:2448 torvalds#14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 torvalds#15 0x564d34033fdc in perf_session__process_events util/session.c:2661 torvalds#16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 torvalds#17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 torvalds#18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 torvalds#19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 torvalds#20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 torvalds#21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Committer testing: $ sudo dnf install libasan $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] torvalds#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] torvalds#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] torvalds#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] torvalds#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] torvalds#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] torvalds#11 dio_complete at ffffffff8c2b9fa7 torvalds#12 do_blockdev_direct_IO at ffffffff8c2bc09f torvalds#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] torvalds#14 generic_file_direct_write at ffffffff8c1dcf14 torvalds#15 __generic_file_write_iter at ffffffff8c1dd07b torvalds#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] torvalds#17 aio_write at ffffffff8c2cc72e torvalds#18 kmem_cache_alloc at ffffffff8c248dde torvalds#19 do_io_submit at ffffffff8c2ccada torvalds#20 do_syscall_64 at ffffffff8c004984 torvalds#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] torvalds#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] torvalds#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] torvalds#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] torvalds#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] torvalds#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] torvalds#11 dio_complete at ffffffff8c2b9fa7 torvalds#12 do_blockdev_direct_IO at ffffffff8c2bc09f torvalds#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] torvalds#14 generic_file_direct_write at ffffffff8c1dcf14 torvalds#15 __generic_file_write_iter at ffffffff8c1dd07b torvalds#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] torvalds#17 aio_write at ffffffff8c2cc72e torvalds#18 kmem_cache_alloc at ffffffff8c248dde torvalds#19 do_io_submit at ffffffff8c2ccada torvalds#20 do_syscall_64 at ffffffff8c004984 torvalds#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 30479f3 ] In the buffered write path, the dirty page owns the qgroup reserve until it creates an ordered_extent. Therefore, any errors that occur before the ordered_extent is created must free that reservation, or else the space is leaked. The fstest generic/475 exercises various IO error paths, and is able to trigger errors in cow_file_range where we fail to get to allocating the ordered extent. Note that because we *do* clear delalloc, we are likely to remove the inode from the delalloc list, so the inodes/pages to not have invalidate/launder called on them in the commit abort path. This results in failures at the unmount stage of the test that look like: BTRFS: error (device dm-8 state EA) in cleanup_transaction:2018: errno=-5 IO failure BTRFS: error (device dm-8 state EA) in btrfs_replace_file_extents:2416: errno=-5 IO failure BTRFS warning (device dm-8 state EA): qgroup 0/5 has unreleased space, type 0 rsv 28672 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 22588 at fs/btrfs/disk-io.c:4333 close_ctree+0x222/0x4d0 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor zstd_compress raid6_pq CPU: 3 PID: 22588 Comm: umount Kdump: loaded Tainted: G W 6.10.0-rc7-gab56fde445b8 torvalds#21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:close_ctree+0x222/0x4d0 [btrfs] RSP: 0018:ffffb4465283be00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffa1a1818e1000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffb4465283bbe0 RDI: ffffa1a19374fcb8 RBP: ffffa1a1818e13c0 R08: 0000000100028b16 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000003 R12: ffffa1a18ad7972c R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9168312b80(0000) GS:ffffa1a4afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91683c9140 CR3: 000000010acaa000 CR4: 00000000000006f0 Call Trace: <TASK> ? close_ctree+0x222/0x4d0 [btrfs] ? __warn.cold+0x8e/0xea ? close_ctree+0x222/0x4d0 [btrfs] ? report_bug+0xff/0x140 ? handle_bug+0x3b/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? close_ctree+0x222/0x4d0 [btrfs] generic_shutdown_super+0x70/0x160 kill_anon_super+0x11/0x40 btrfs_kill_super+0x11/0x20 [btrfs] deactivate_locked_super+0x2e/0xa0 cleanup_mnt+0xb5/0x150 task_work_run+0x57/0x80 syscall_exit_to_user_mode+0x121/0x130 do_syscall_64+0xab/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f916847a887 ---[ end trace 0000000000000000 ]--- BTRFS error (device dm-8 state EA): qgroup reserved space leaked Cases 2 and 3 in the out_reserve path both pertain to this type of leak and must free the reserved qgroup data. Because it is already an error path, I opted not to handle the possible errors in btrfs_free_qgroup_data. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c torvalds#20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)torvalds#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Reported-by: [email protected] Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Nilay Shroff <[email protected]>

there is a global spinlock between reset and clk, if locked in reset, then print some debug information, maybe dead-lock when uart driver try to disable clk. Backtrace stopped: frame did not save the PC (gdb) thread 4 [Switching to thread 4 (Thread 4)] #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 22 ./arch/riscv/include/asm/vdso/processor.h: No such file or directory. (gdb) bt #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 #1 arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186 #3 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162 #5 0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325 torvalds#6 0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062 torvalds#7 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#8 clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079 torvalds#9 0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>) at drivers/tty/serial/pxa_k1x.c:1724 torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942 torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731 torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793 torvalds#13 console_unlock () at kernel/printk/printk.c:2860 torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2268 torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279 torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50 torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289 torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373 torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401 torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387 torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413 torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485 torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960 torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038 torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135 torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105 torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459 torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320 torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348 torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179 torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49 torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478 torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374 Backtrace stopped: frame did not save the PC (gdb) thread 1 [Switching to thread 1 (Thread 1)] #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 49 ./include/asm-generic/spinlock.h: No such file or directory. (gdb) bt #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 #1 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186 #2 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111 #3 _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162 #4 0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325 #5 0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051 torvalds#6 clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031 torvalds#7 0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063 torvalds#8 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#9 clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079 torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085 torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469 torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25 torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395 torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529 torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672 torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974 torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289 torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436 torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376 torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249 Backtrace stopped: frame did not save the PC (gdb) Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c

fix the warning that excute poweroff command in userspace, the log like blow: [ 17.576305] Voluntary context switch within RCU read-side critical section! [ 17.576319] WARNING: CPU: 0 PID: 158 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x414/0x446 [ 17.576342] Modules linked in: [ 17.576349] CPU: 0 PID: 158 Comm: poweroff Not tainted 6.6.36+ torvalds#21 [ 17.576356] Hardware name: spacemit k1-x deb1 board (DT) [ 17.576360] epc : rcu_note_context_switch+0x414/0x446 [ 17.576367] ra : rcu_note_context_switch+0x414/0x446 [ 17.576372] epc : ffffffff8008e7ee ra : ffffffff8008e7ee sp : ffffffc801cb3680 [ 17.576378] gp : ffffffff852f68e0 tp : ffffffd90d400000 t0 : ffffffff8503da78 [ 17.576382] t1 : 000000000000002d t2 : 2d2d2d2d2d2d2d2d s0 : ffffffc801cb36c0 [ 17.576386] s1 : ffffffd979eb6980 a0 : 000000000000003f a1 : ffffffff8508bcb8 [ 17.576390] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000 [ 17.576394] a5 : 0000000000000000 a6 : ffffffff8501bc60 a7 : 0000000000000038 [ 17.576397] s2 : ffffffd90d400000 s3 : 0000000000000000 s4 : 0000000000000001 [ 17.576401] s5 : ffffffff852d5de0 s6 : ffffffff85336bc8 s7 : ffffffff810c72de [ 17.576405] s8 : ffffffff85336bc8 s9 : 0000000000000000 s10: 0000000000000000 [ 17.576409] s11: ffffffc801cb3808 t3 : ffffffff8530fcdf t4 : ffffffff8530fcdf [ 17.576412] t5 : ffffffff8530fce0 t6 : ffffffc801cb3478 [ 17.576415] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 17.576420] [<ffffffff8008e7ee>] rcu_note_context_switch+0x414/0x446 [ 17.576430] [<ffffffff810c62c8>] __schedule+0x7e/0x848 [ 17.576442] [<ffffffff810c6ada>] schedule+0x48/0xd2 [ 17.576449] [<ffffffff810cbc10>] schedule_timeout+0x72/0x16c [ 17.576458] [<ffffffff810c72f4>] __wait_for_common+0xd4/0x1c4 [ 17.576465] [<ffffffff810c741e>] wait_for_completion_timeout+0x18/0x20 [ 17.576472] [<ffffffff80b5932e>] spacemit_i2c_xfer+0x1c8/0x14cc [ 17.576484] [<ffffffff80b5251c>] __i2c_transfer+0x19a/0x652 [ 17.576493] [<ffffffff80b52a2e>] i2c_transfer+0x5a/0x98 [ 17.576500] [<ffffffff809ab710>] regmap_i2c_read+0x4e/0x7e [ 17.576509] [<ffffffff809a68fa>] _regmap_raw_read+0xc2/0x240 [ 17.576519] [<ffffffff809a6aa6>] _regmap_bus_read+0x2e/0x64 [ 17.576526] [<ffffffff809a577a>] _regmap_read+0x40/0x122 [ 17.576533] [<ffffffff809a5d76>] _regmap_update_bits+0x98/0xd6 [ 17.576540] [<ffffffff809a70a0>] regmap_update_bits_base+0x40/0x66 [ 17.576547] [<ffffffff809b930c>] spacemit_pm_power_off+0x28/0x4a [ 17.576555] [<ffffffff8003a882>] legacy_pm_power_off+0x14/0x22 [ 17.576566] [<ffffffff8003a83a>] sys_off_notify+0x2c/0x4a [ 17.576573] [<ffffffff80039460>] notifier_call_chain+0x3c/0x100 [ 17.576579] [<ffffffff8003960a>] atomic_notifier_call_chain+0x28/0x3e [ 17.576586] [<ffffffff8003b63a>] do_kernel_power_off+0x3a/0x4a [ 17.576594] [<ffffffff80004a58>] machine_halt+0xc/0x16 [ 17.576602] [<ffffffff80004a6e>] save_return_addr+0x0/0x20 [ 17.576608] [<ffffffff8003b3be>] kernel_power_off+0x66/0x6e [ 17.576615] [<ffffffff8003b54a>] __do_sys_reboot+0x184/0x1b8 [ 17.576622] [<ffffffff8003b660>] __riscv_sys_reboot+0x16/0x1e [ 17.576629] [<ffffffff810c3fc0>] do_trap_ecall_u+0xba/0x12e [ 17.576638] [<ffffffff810cd4be>] ret_from_exception+0x0/0x6e [ 17.576646] ---[ end trace 0000000000000000 ]--- Change-Id: Ie1b089f32a0e34b328fd7a835151e77b14333b51

there is a global spinlock between reset and clk, if locked in reset, then print some debug information, maybe dead-lock when uart driver try to disable clk. Backtrace stopped: frame did not save the PC (gdb) thread 4 [Switching to thread 4 (Thread 4)] #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 22 ./arch/riscv/include/asm/vdso/processor.h: No such file or directory. (gdb) bt #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 #1 arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186 #3 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162 #5 0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325 torvalds#6 0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062 torvalds#7 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#8 clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079 torvalds#9 0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>) at drivers/tty/serial/pxa_k1x.c:1724 torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942 torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731 torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793 torvalds#13 console_unlock () at kernel/printk/printk.c:2860 torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2268 torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279 torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50 torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289 torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373 torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401 torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387 torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413 torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485 torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960 torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038 torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135 torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105 torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459 torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320 torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348 torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179 torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49 torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478 torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374 Backtrace stopped: frame did not save the PC (gdb) thread 1 [Switching to thread 1 (Thread 1)] #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 49 ./include/asm-generic/spinlock.h: No such file or directory. (gdb) bt #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 #1 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186 #2 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111 #3 _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162 #4 0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325 #5 0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051 torvalds#6 clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031 torvalds#7 0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063 torvalds#8 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#9 clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079 torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085 torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469 torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25 torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395 torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529 torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672 torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974 torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289 torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436 torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376 torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249 Backtrace stopped: frame did not save the PC (gdb) Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c

fix the warning that excute poweroff command in userspace, the log like blow: [ 17.576305] Voluntary context switch within RCU read-side critical section! [ 17.576319] WARNING: CPU: 0 PID: 158 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x414/0x446 [ 17.576342] Modules linked in: [ 17.576349] CPU: 0 PID: 158 Comm: poweroff Not tainted 6.6.36+ torvalds#21 [ 17.576356] Hardware name: spacemit k1-x deb1 board (DT) [ 17.576360] epc : rcu_note_context_switch+0x414/0x446 [ 17.576367] ra : rcu_note_context_switch+0x414/0x446 [ 17.576372] epc : ffffffff8008e7ee ra : ffffffff8008e7ee sp : ffffffc801cb3680 [ 17.576378] gp : ffffffff852f68e0 tp : ffffffd90d400000 t0 : ffffffff8503da78 [ 17.576382] t1 : 000000000000002d t2 : 2d2d2d2d2d2d2d2d s0 : ffffffc801cb36c0 [ 17.576386] s1 : ffffffd979eb6980 a0 : 000000000000003f a1 : ffffffff8508bcb8 [ 17.576390] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000 [ 17.576394] a5 : 0000000000000000 a6 : ffffffff8501bc60 a7 : 0000000000000038 [ 17.576397] s2 : ffffffd90d400000 s3 : 0000000000000000 s4 : 0000000000000001 [ 17.576401] s5 : ffffffff852d5de0 s6 : ffffffff85336bc8 s7 : ffffffff810c72de [ 17.576405] s8 : ffffffff85336bc8 s9 : 0000000000000000 s10: 0000000000000000 [ 17.576409] s11: ffffffc801cb3808 t3 : ffffffff8530fcdf t4 : ffffffff8530fcdf [ 17.576412] t5 : ffffffff8530fce0 t6 : ffffffc801cb3478 [ 17.576415] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 17.576420] [<ffffffff8008e7ee>] rcu_note_context_switch+0x414/0x446 [ 17.576430] [<ffffffff810c62c8>] __schedule+0x7e/0x848 [ 17.576442] [<ffffffff810c6ada>] schedule+0x48/0xd2 [ 17.576449] [<ffffffff810cbc10>] schedule_timeout+0x72/0x16c [ 17.576458] [<ffffffff810c72f4>] __wait_for_common+0xd4/0x1c4 [ 17.576465] [<ffffffff810c741e>] wait_for_completion_timeout+0x18/0x20 [ 17.576472] [<ffffffff80b5932e>] spacemit_i2c_xfer+0x1c8/0x14cc [ 17.576484] [<ffffffff80b5251c>] __i2c_transfer+0x19a/0x652 [ 17.576493] [<ffffffff80b52a2e>] i2c_transfer+0x5a/0x98 [ 17.576500] [<ffffffff809ab710>] regmap_i2c_read+0x4e/0x7e [ 17.576509] [<ffffffff809a68fa>] _regmap_raw_read+0xc2/0x240 [ 17.576519] [<ffffffff809a6aa6>] _regmap_bus_read+0x2e/0x64 [ 17.576526] [<ffffffff809a577a>] _regmap_read+0x40/0x122 [ 17.576533] [<ffffffff809a5d76>] _regmap_update_bits+0x98/0xd6 [ 17.576540] [<ffffffff809a70a0>] regmap_update_bits_base+0x40/0x66 [ 17.576547] [<ffffffff809b930c>] spacemit_pm_power_off+0x28/0x4a [ 17.576555] [<ffffffff8003a882>] legacy_pm_power_off+0x14/0x22 [ 17.576566] [<ffffffff8003a83a>] sys_off_notify+0x2c/0x4a [ 17.576573] [<ffffffff80039460>] notifier_call_chain+0x3c/0x100 [ 17.576579] [<ffffffff8003960a>] atomic_notifier_call_chain+0x28/0x3e [ 17.576586] [<ffffffff8003b63a>] do_kernel_power_off+0x3a/0x4a [ 17.576594] [<ffffffff80004a58>] machine_halt+0xc/0x16 [ 17.576602] [<ffffffff80004a6e>] save_return_addr+0x0/0x20 [ 17.576608] [<ffffffff8003b3be>] kernel_power_off+0x66/0x6e [ 17.576615] [<ffffffff8003b54a>] __do_sys_reboot+0x184/0x1b8 [ 17.576622] [<ffffffff8003b660>] __riscv_sys_reboot+0x16/0x1e [ 17.576629] [<ffffffff810c3fc0>] do_trap_ecall_u+0xba/0x12e [ 17.576638] [<ffffffff810cd4be>] ret_from_exception+0x0/0x6e [ 17.576646] ---[ end trace 0000000000000000 ]--- Change-Id: Ie1b089f32a0e34b328fd7a835151e77b14333b51

…s_lock For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c torvalds#20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)torvalds#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: [email protected] Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Nilay Shroff <[email protected]>

…s_lock For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: [email protected] Fixes: af28141 ("block: freeze the queue in queue_attr_store") Tested-by: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Nilay Shroff <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>

Fix missing includes in UML

7c7854e

asamy closed this Aug 25, 2012

hno pushed a commit to hno/linux that referenced this pull request Sep 12, 2012

Fix build when using O=

4a92fcd

Fixes issue torvalds#21 on amery/linux-allwinner

hno pushed a commit to hno/linux that referenced this pull request Sep 12, 2012

Merge pull request torvalds#22 from allwinner-dev-team/allwinner-3.0

f8959c5

Fix build using O= (issue torvalds#21) and inline build on CM9

noamc referenced this pull request in Mellanox/linux Oct 16, 2012

Revert "ARC: SMP resurrect foss-for-synopsys-dwc-arc-processors#21: r…

6dc770a

…equest_irq() workaround for clockevent Timer" This reverts commit 2985184. Next commit uses the correct APIs, so we no longer need this hack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UML #21

UML #21

asamy commented Aug 24, 2012

UML #21

UML #21

Conversation

asamy commented Aug 24, 2012