-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpath loop and air metric calculation #74
Comments
for 1) you'll have to add your own test frame mechanism to keep rate control fresh.
|
@twpedersen : Thanks for the reply. |
@zhejunli can you explain the loop in terms of HWMP? |
I have 4 nodes ,n[1..4] and the topology is like PC1--->n1--->n2--->n3--->n4--->PC2. https://arxiv.org/pdf/1512.08891.pdf mentions about the same thing. |
If the originator address is similar to the interface's address, the PREQ
frame will be silently discarded and no routing info is updated.
As discussed in the paper on self entry as follow:
"Node S also receives the forwarded RREQS->X message from node D, and
before silently discarding the message (since it is the originator of the
RREQ message), updates its routing table to create an entry to node D."
HWMP not allow the above to happen, so based on the results in table 2. It
should be loop free.
When you see the data bouncing back and forth between n3 and n4, are you
sure that the path link only established between n1 and n2, n2 and n3, n3
and n4?
…---
Chun-Yeow
On Sat, Feb 17, 2018 at 5:25 AM, zhejunli ***@***.***> wrote:
@twpedersen <https://github.com/twpedersen> :
I have 4 nodes ,n[1..4] and the topology is like
PC1--->n1--->n2--->n3--->n4--->PC2.
While pushing UDP data from PC1 to PC1, and adjusting the signal strength
level between nodes, sometimes, the mpath table of n3 shows the next_hop is
n4 while the n4's table showing the next_hop is n3. The data packets just
bouncing back and forth between n3 and n4.
https://arxiv.org/pdf/1512.08891.pdf mentions about the same thing.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#74 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBewurNXi11UQPzsaYD5o6tBtjyVirKks5tVfJigaJpZM4SIu8J>
.
|
On Wed, Feb 21, 2018 at 01:09:56AM -0800, Chun-Yeow wrote:
When you see the data bouncing back and forth between n3 and n4, are you
sure that the path link only established between n1 and n2, n2 and n3, n3
and n4?
It would be interesting to see a pcap file of the hwmp and data frames
when this happens also.
…--
Bob Copeland %% https://bobcopeland.com/
|
@chunyeow Thanks for the reply. I can't guarantee the path link only established between n1-n2, n2-n3 and n3-n4 like a chain. Actually I was changing the signal levels between nodes to simulate a random real-world situation. When signal levels are stable, the connection is good and stable. But when changing the "environment" to trigger a mpath change, it looks like the loop problem happens but not always. To be more specific, I set up a chain link like PC1-->n1-->n2-->n3-->n4-->PC2 first and run iperf from PC1 to PC2. Meanwhile, I increase the attenuation between n1-n2 and n2-n3 but reduce the attenuation between n1-n3, hoping the mpath change to PC1-->n1-->n3-->n4-->PC2. Randomly the loop happens. I have seen this happen in a 3 node system too. From the debug information it looked like a new path SN was trying to update the outdated mpath data structure with bigger SN and failed. Here the new path SN has the latest fresh mpath information while the old mpath information has old outdated mpath information but has a higher SN. So that the new SN couldn't update the old mpath. It is hard to reproduce. Jeff Jeff |
On Wed, Feb 21, 2018 at 10:26:46AM -0800, zhejunli wrote:
To be more specific, I set up a chain link like PC1-->n1-->n2-->n3-->n4-->PC2 first and run iperf from PC1 to PC2. Meanwhile, I increase the attenuation between n1-n2 and n2-n3 but reduce the attenuation between n1-n3, hoping the mpath change to PC1-->n1-->n3-->n4-->PC2. Randomly the loop happens.
I have seen this happen in a 3 node system too. From the debug information it looked like a new path SN was trying to update the outdated mpath data structure with bigger SN and failed. Here the new path SN has the latest fresh mpath information while the old mpath information has old outdated mpath information but has a higher SN. So that the new SN couldn't update the old mpath.
It is hard to reproduce.
This was with actual hardware right?
This seems like something we could do in wmediumd - I can give it a try when
I get some time.
|
@bcopeland Yes this is with ath9k chip. The code base is from OpenWrt CC, kernel v3.10.36. I don't know how OpenWrt synchronize with latest open80211s code base. Maybe the code in OpenWrt is too old? Thanks. |
On Fri, Feb 23, 2018 at 04:35:58PM +0000, zhejunli wrote:
@bcopeland Yes this is with ath9k chip. The code base is from OpenWrt CC, kernel v3.10.36.
I don't know how OpenWrt synchronize with latest open80211s code base. Maybe the code in OpenWrt is too old?
I think (but don't quote me, haven't looked recently) they use backports
built from wireless-testing for mac80211 and wireless drivers. So those
parts should be fairly up-to-date.
…--
Bob Copeland %% https://bobcopeland.com/
|
As Bob pointed out, testing with wmediumd may be the way forward since it
is very hard to reproduce in your environment.
By the way, can you check whether the following patch is available:
https://www.mail-archive.com/[email protected]/msg03106.html
…On Sat, Feb 24, 2018 at 12:35 AM, zhejunli ***@***.***> wrote:
@bcopeland <https://github.com/bcopeland> Yes this is with ath9k chip.
The code base is from OpenWrt CC, kernel v3.10.36.
I don't know how OpenWrt synchronize with latest open80211s code base.
Maybe the code in OpenWrt is too old?
Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#74 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABBewjhmr-ONY-opjmrtfAIvViUrykkPks5tXujvgaJpZM4SIu8J>
.
|
@chunyeow I confirm that patch is in place already. |
The mpath loop didn't happen and just let it alone. Another question about the air metric. Is it possible to get a fresh metric of a "potential" mpath that is not being used currently? Example, 3 mesh nodes n1,n2 and n3 are all in scope with each other. There is a iperf traffic from n1 directly to n3, n2 being a by stander. In this case, the TX rate and PER of n1--->n3 is always fresh and updated. However, because there's no traffic from n1 to n2, so that the air metric of n1--->n2 is never updated. I added a test frame mechanism so that n1 sends test frames to n2 and n3 periodically in order to maintain a fresh air metric of n1--->n2. But I found the rate control can not give a correct/reasonable TX rate of n1--->n2 only by this small amount of test frame traffic. During the iperf test, I increased the attenuation between n1 and n3 hoping to switch the mpath from n1---->n3 to n1--->n2--->n3. But I found this caused a wrong mpath switch decision. Because the air metric of n1--->n2 and n2--->n3 are not fresh because of no traffic or too little test frames traffic. My questions is: Thanks, Jeff |
On Mon, Mar 26, 2018 at 10:39:32AM -0700, zhejunli wrote:
My questions is:
Does 802.11s protocol do "better mpath selection" dynamically? Per my understanding it does. A mpath gets inactive periodically and a new mpath is formed. This mechanism makes sure the mpath is fresh and optimal. But if some potential mpathes' TX rate and PER were not updated properly before this time point, this new formed mpath will be wrong.
It does, but airtime metric won't update significantly on the basis of just a
few management frames that HWMP uses.
AFAIK this (estimating PER) is left to the implementation. You'll have
to send a fair amount of data through the other nodes periodically to
update the statistics tracked by the rate controller. Or come up with
another estimator that doesn't rely on frame loss.
|
@bcopeland Thanks for the reply. For the rate control algorithm, it doesn't care the HWMP path selection packets and it only care the data packets I think. So I added a mechanism to send some test DATA frames as a background traffic. This suppose to train the rate control algorithm to keep a correct rate information to a specific peer but looks not enough. I still get wrong rate to a "POTENTIAL" peer. It maybe because of the RateControl behavior. |
It is the Minstrel rate control that underestimates the TX rate to a POTENTIAL mpath peer. The test frames can not keep Minstrel to adjust to a proper, real rate. Looks like the test frames are too small traffic and only higher traffic can let Minstrel make a right decision. |
On Tue, Apr 03, 2018 at 07:55:01AM -0700, zhejunli wrote:
It is the Minstrel rate control that underestimates the TX rate to a POTENTIAL mpath peer. The test frames can not keep Minstrel to adjust to a proper, real rate. Looks like the test frames are too small traffic and only higher traffic can let Minstrel make a right decision.
Indeed. Also, IIRC, minstrel assumes every unsampled rate/link has 0%
probability to begin with. So it would be most accurate to say most of
the time, we don't know how good the potential link is. Perhaps
integrating a confidence into the sampling selection algorithm could help
this use case.
|
Yes that will help. And that involves the Minstrel part which I don't want to touch for the time being. Per my understanding, 802.11s claims that it can DYNAMICALLY choose a better mpath by periodically deactivating an existing mpath and generating PREQ to search for a lower metric mpath. But, if a "POTENTIAL" mpath peer's rate ,PER status are outdated, the new path search will cause a fake "optimal" mpath. Is this a defect of 802.11s protocol? |
Some tweaks are made and looks good so far. Still 2 major issues (I believe) left:
|
Hello,
I am using open802.11s in ath9k chip. 2 things I have noticed:
The spec mentions the Test Frames but I didn't find anywhere a test frame is sent. This will cause a problem. The problem is that the TX rate is not fresh when there's no traffic going this path. When this outdated Tx rate participates the ALM calculation, a wrong result is made.
The SN counter looks can not avoid mpath loop. It happened with 4 nodes on by test bench. This paper explains this loop situation : https://arxiv.org/pdf/1512.08891.pdf.
Any ideas about these?
Thanks,
Jeff
The text was updated successfully, but these errors were encountered: