-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mac80211: increment the seqnum of PREP whenever PREQ is processed #28
Conversation
Mesh STA A [mesh0] <----> [mesh0] Mesh STA B [mesh1] <---> [mesh0] Mesh STA C In the case, when RANN is generated by Mesh STA A, to Mesh STA B and Mesh STA C, PREQs are generated by both Mesh STA B and Mesh STA C respectively. Mesh STA A will receive both PREQ from Mesh STA A and Mesh STA C almost at the same time. In current implementation, the sequence number is not increment due to purpose of providing path stability in short term using dot11MeshHWMPnetDiameterTraversalTime and this has caused Mesh STA C to resend two PREQs whenver receving RANN from Mesh STA A. This patch is intended to increment the sequence number so that Mesh STA B not dropping the PREP from Mesh STA A destined to Mesh STA C at first try. Signed-off-by: Chun-Yeow Yeoh <[email protected]>
"Mesh STA A |
Yes, it should be PREQs from both Mesh STA B and Mesh STA C.
Target Sequence Number in the PREQ element is the latest HWMP sequence number stored by the originator mesh STA for the target mesh STA. So once the target mesh STA received it, as according to section 11C.9.8.3 (HWMP sequence numbering), it shall update its own HWMP sequence number to maximum (current HWMP sequence number, target HWMP sequence number in the PREQ) + 1 immediately before it generates a PREP in response to a PREQ.
So, if ifmsh->sn is less than target_sn, we maximize it and plus 1 for our PREP generation as target mesh STA. Then, the originator mesh STA can decide which is the better path after receiving more than one PREPs. I think that the previous implementation is to ensure path stability in short term. So that the originator mesh STA will stick to the path from the first received PREP from its target mesh STA within the net_traversal_interval. This is because the originator mesh STA will drop the subsequent PREPs if the target HWMP sequence number is not updated. The paper entitled "A joint experimental and simulation study of the IEEE 802.11s HWMP protocol and airtime link metric", section "The non-increasing DSN option", explains this better.
I have one problem when Mesh STA B with dual radio support and RANN is turned on at Mesh STA A. Whenever a RANN is received by both Mesh STA B and C, both of them generates the PREQ to mesh STA A. Mesh STA A replies with PREP to mesh STA B and PREP to mesh STA C via mesh STA B. It happens that the 1st PREP destined to mesh STA B with an updated target_sn but 2nd PREP to mesh STA C without an updated target_sn, the hwmp_route_info_get happens to drop the frame due to this. Thus, mesh STA C will again generate another PREQ. So for each RANN, mesh STA C will generate two PREQs. This will cause excessive number of PREQs if the number of mesh STAs is increased.
Need authentication on this? Is a openwrt platform. |
How do I respond inline? :P
Yes, but ifmsh->sn is the data frame sequence number. The HWMP SN is stored in mpath->SN?
OK. Thanks.
OK makes sense.
Right, sorry. We'll need to authorize your public key. I'll ask about this, but probably better to have an open o11s channel in general. |
If we are the final recipient of the PREQ element, the target HWMP SN of the generated PREP element is based on our ifmsh->sn. |
ifmsh->sn is our own HWMP SN and mpath->sn is the target HWMP SN. If we are the PREQ is destined for us, the PREP element's target HWMP SN is our own HWMP SN and originator HWMP SN is remained the same as PREQ element. |
I guess it's also worth pointing out the XXX comment above the change -- we use ifmsh->sn from the wrong interface. I think it should at least be something like: iface = mesh_bss_find_if(mbss, target_addr); ...but I haven't really digested the above conversation yet. |
With git commit c705c78 "acpi: Export the acpi_processor_get_performance_info" we are now using a different mechanism to access the P-states. The acpi_processor per-cpu structure is set and filtered by the core ACPI code which shrinks the per_cpu contents to only online CPUs. In the past we would call acpi_processor_register_performance() which would have not tried to dereference offline cpus. With the new patch and the fact that the loop we take is for for_all_possible_cpus we end up crashing on some machines. We could modify the loop to be for online_cpus - but all the other loops in the code use possible_cpus (for a good reason) - so lets leave it as so and just check if per_cpu(processor) is NULL. With this patch we will bypass the !online but possible CPUs. This fixes: IP: [<ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor] PGD 4126e6067 PUD 4126e3067 PMD 0 Oops: 0002 [#1] SMP Pid: 432, comm: modprobe Not tainted 3.9.0-rc3+ #28 To be filled by O.E.M. To be filled by O.E.M./M5A97 RIP: e030:[<ffffffffa00d13b5>] [<ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor] RSP: e02b:ffff88040c8a3ce8 EFLAGS: 00010282 .. snip.. Call Trace: [<ffffffffa00d11ff>] ? read_acpi_id+0x12b/0x12b [xen_acpi_processor] [<ffffffff8100215a>] do_one_initcall+0x12a/0x180 [<ffffffff810c42c3>] load_module+0x1cd3/0x2870 [<ffffffff81319b70>] ? ddebug_proc_open+0xc0/0xc0 [<ffffffff810c4f37>] sys_init_module+0xd7/0x120 [<ffffffff8166ce19>] system_call_fastpath+0x16/0x1b on some machines. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Chun-Yeow, You're right, ifmsh->sn is the HWMP SN and ifmsh->mesh_seqnum is the mesh sequence number. |
Hi, Bob I am not too sure about this. But this patch is actually applicable on mesh STA A which only have one mesh interface. So ifmsh->sn should be correct. By the way, are we able to assign a specific IP address for mesh node with multiple interfaces. How? |
Hi Chun-Yeow, I read the paper you referenced (thanks! :)), and this change makes sense, but don't you really want to remove the netdiametertraversal thing entirely? It seems with this patch you're disabling it anyway. Also this patch is not strictly related to multi-if, so a patch submission to o11s-devel would be fine. Thomas |
Hi, Thomas I think that during the PREQ generation by mesh_path_start_discovery, the Sequence Number is increased after the HWMP net diameter traversal time which is as defined by 802.11s-2011 section 11C.9.8.6 In this case, the originator HWMP SN shall be incremented only after at least dot11MeshHWMPnetDiameterTraversalTime has elapsed since the previous increment. So it should remain. But for PREP generation, the netdiametertraversal seems to be implementation specific by o11s. Previously, this has not caused problem of duplicate PREQ frames, only happens when multi-if nodes are deployed. Chun-Yeow |
Makes sense, but the improper reverse path (PREP) metric detection due to one having a higher sequence number (as described in the paper) would still take place in a single-channel mesh. |
Oleksii reported that he had seen an oops similar to this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 IP: [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ipt_MASQUERADE xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables carl9170 ath usb_storage f2fs nfnetlink_log nfnetlink md4 cifs dns_resolver hid_generic usbhid hid af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfcomm btusb bnep bluetooth qmi_wwan qcserial cdc_wdm usb_wwan usbnet usbserial mii snd_hda_codec_hdmi snd_hda_codec_realtek iwldvm mac80211 coretemp intel_powerclamp kvm_intel kvm iwlwifi snd_hda_intel cfg80211 snd_hda_codec xhci_hcd e1000e ehci_pci snd_hwdep sdhci_pci snd_pcm ehci_hcd microcode psmouse sdhci thinkpad_acpi mmc_core i2c_i801 pcspkr usbcore hwmon snd_timer snd_page_alloc snd ptp rfkill pps_core soundcore evdev usb_common vboxnetflt(O) vboxdrv(O)Oops#2 Part8 loop tun binfmt_misc fuse msr acpi_call(O) ipv6 autofs4 CPU: 0 PID: 21612 Comm: kworker/0:1 Tainted: G W O 3.10.1SIGN #28 Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET92WW (2.52 ) 02/22/2013 Workqueue: cifsiod cifs_echo_request [cifs] task: ffff8801e1f416f0 ti: ffff880148744000 task.ti: ffff880148744000 RIP: 0010:[<ffffffff814dcc13>] [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP: 0000:ffff880148745b00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880148745b78 RCX: 0000000000000048 RDX: ffff880148745c90 RSI: ffff880181864a00 RDI: ffff880148745b78 RBP: ffff880148745c48 R08: 0000000000000048 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880181864a00 R13: ffff880148745c90 R14: 0000000000000048 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000088 CR3: 000000020c42c000 CR4: 00000000001407b0 Oops#2 Part7 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880148745b30 ffffffff810c4af9 0000004848745b30 ffff880181864a00 ffffffff81ffbc40 0000000000000000 ffff880148745c90 ffffffff810a5aab ffff880148745bc0 ffffffff81ffbc40 ffff880148745b60 ffffffff815a9fb8 Call Trace: [<ffffffff810c4af9>] ? finish_task_switch+0x49/0xe0 [<ffffffff810a5aab>] ? lock_timer_base.isra.36+0x2b/0x50 [<ffffffff815a9fb8>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff810a673f>] ? try_to_del_timer_sync+0x4f/0x70 [<ffffffff815aa38f>] ? _raw_spin_unlock_bh+0x1f/0x30 [<ffffffff814dcc87>] kernel_sendmsg+0x37/0x50 [<ffffffffa081a0e0>] smb_send_kvec+0xd0/0x1d0 [cifs] [<ffffffffa081a263>] smb_send_rqst+0x83/0x1f0 [cifs] [<ffffffffa081ab6c>] cifs_call_async+0xec/0x1b0 [cifs] [<ffffffffa08245e0>] ? free_rsp_buf+0x40/0x40 [cifs] Oops#2 Part6 [<ffffffffa082606e>] SMB2_echo+0x8e/0xb0 [cifs] [<ffffffffa0808789>] cifs_echo_request+0x79/0xa0 [cifs] [<ffffffff810b45b3>] process_one_work+0x173/0x4a0 [<ffffffff810b52a1>] worker_thread+0x121/0x3a0 [<ffffffff810b5180>] ? manage_workers.isra.27+0x2b0/0x2b0 [<ffffffff810bae00>] kthread+0xc0/0xd0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 [<ffffffff815b199c>] ret_from_fork+0x7c/0xb0 [<ffffffff810bad40>] ? kthread_create_on_node+0x120/0x120 Code: 84 24 b8 00 00 00 4c 89 f1 4c 89 ea 4c 89 e6 48 89 df 4c 89 60 18 48 c7 40 28 00 00 00 00 4c 89 68 30 44 89 70 14 49 8b 44 24 28 <ff> 90 88 00 00 00 3d ef fd ff ff 74 10 48 8d 65 e0 5b 41 5c 41 RIP [<ffffffff814dcc13>] sock_sendmsg+0x93/0xd0 RSP <ffff880148745b00> CR2: 0000000000000088 The client was in the middle of trying to send a frame when the server->ssocket pointer got zeroed out. In most places, that we access that pointer, the srv_mutex is held. There's only one spot that I see that the server->ssocket pointer gets set and the srv_mutex isn't held. This patch corrects that. The upstream bug report was here: https://bugzilla.kernel.org/show_bug.cgi?id=60557 Cc: <[email protected]> Reported-by: Oleksii Shevchuk <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
The 'driver' field of the i2c_client struct is redundant. The same data can be accessed through to_i2c_driver(client->dev.driver). The generated code for both approaches in more or less the same. E.g. on ARM the expression client->driver->command(...) generates ... ldr r3, [r0, #28] ldr r3, [r3, #32] blx r3 ... and the expression to_i2c_driver(client->dev.driver)->command(...) generates ... ldr r3, [r0, #160] ldr r3, [r3, #-4] blx r3 ... Other architectures will generate similar code. All users of the 'driver' field outside of the I2C core have already been converted. So this only leaves the core itself. This patch converts the remaining few users in the I2C core and then removes the 'driver' field from the i2c_client struct. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
We need to defer the free request until the object/vma is capable of being freed - or else we have a problem when we try to destroy the context. The exact same issue is described and fixed here: commit e20780439b26ba95aeb29d3e27cd8cc32bc82a4c Author: Ben Widawsky <[email protected]> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing I had this fix previously, but decided not to keep it for some reason I can no longer remember. gem_reset_stats is a really good test at hitting the problem. For the inquisitive: [ 170.516392] ------------[ cut here ]------------ [ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]() [ 170.518064] Memory manager not clean during takedown. [ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28 [ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012 [ 170.520662] Call Trace: [ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a [ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0 [ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50 [ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm] [ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110 [i915] [ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29 [i915] [ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0 [i915] [ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0 [i915] [ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915] [ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915] [ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120 [i915] [ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0 [ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0 [ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0 [ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0 [ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120 [ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0 [ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.536847] ---[ end trace 3d4c12892e42d58f ]--- v2: Whitespace fix. (Chris) Note: This is a bug that only hits the ppgtt topic branch but I've figured that doing the request cleanup in this order is generally the right thing to do. Signed-off-by: Ben Widawsky <[email protected]> [danvet: Add a code comment to clarify what's actually going on since the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also add a note about why we need this.] Signed-off-by: Daniel Vetter <[email protected]>
Mesh STA A [mesh0] <----> [mesh0] Mesh STA B [mesh1] <---> [mesh0] Mesh STA C
In the case, when RANN is generated by Mesh STA A, to Mesh STA B and Mesh STA C,
PREQs are generated by both Mesh STA B and Mesh STA C respectively. Mesh STA A
will receive both PREQ from Mesh STA A and Mesh STA C almost at the same time.
In current implementation, the sequence number is not increment due to purpose
of providing path stability in short term using dot11MeshHWMPnetDiameterTraversalTime
and this has caused Mesh STA C to resend two PREQs whenver receving RANN from Mesh
STA A. This patch is intended to increment the sequence number so that Mesh STA B
not dropping the PREP from Mesh STA A destined to Mesh STA C at first try.
Signed-off-by: Chun-Yeow Yeoh [email protected]