Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpath loop and air metric calculation #74

Closed
zhejunli opened this issue Feb 16, 2018 · 19 comments
Closed

mpath loop and air metric calculation #74

zhejunli opened this issue Feb 16, 2018 · 19 comments

Comments

@zhejunli
Copy link

Hello,

I am using open802.11s in ath9k chip. 2 things I have noticed:

  1. The spec mentions the Test Frames but I didn't find anywhere a test frame is sent. This will cause a problem. The problem is that the TX rate is not fresh when there's no traffic going this path. When this outdated Tx rate participates the ALM calculation, a wrong result is made.

  2. The SN counter looks can not avoid mpath loop. It happened with 4 nodes on by test bench. This paper explains this loop situation : https://arxiv.org/pdf/1512.08891.pdf.

Any ideas about these?

Thanks,

Jeff

@twpedersen
Copy link
Contributor

for 1) you'll have to add your own test frame mechanism to keep rate control fresh.

  1. In g) D should've incremented his SN prior to RERR? HWMP demands something similar. However, HWMP on-demand routing is also problematic because the PREQ forms a symmetric path back to originator, but the optimal RF path may be highly assymetrical. The only HWMP mode which is reliable is the passive PREQ (hwmp_rootmode = 2), which works similar to batman-adv's OGM.

@zhejunli
Copy link
Author

@twpedersen : Thanks for the reply.
For 1) I have added a mechanism to refresh the TX rate. For 2) we still prefer the on-demand over the proactive approach. So still struggling how to prevent the loop.

@twpedersen
Copy link
Contributor

@zhejunli can you explain the loop in terms of HWMP?

@zhejunli
Copy link
Author

zhejunli commented Feb 16, 2018

@twpedersen :

I have 4 nodes ,n[1..4] and the topology is like PC1--->n1--->n2--->n3--->n4--->PC2.
While pushing UDP data from PC1 to PC2, and adjusting the signal strength level between nodes, sometimes, the mpath table of n3 shows the next_hop is n4 while the n4's table showing the next_hop is n3. The data packets just bouncing back and forth between n3 and n4.

https://arxiv.org/pdf/1512.08891.pdf mentions about the same thing.

@chunyeow
Copy link
Contributor

chunyeow commented Feb 21, 2018 via email

@bcopeland
Copy link
Contributor

bcopeland commented Feb 21, 2018 via email

@zhejunli
Copy link
Author

@chunyeow Thanks for the reply. I can't guarantee the path link only established between n1-n2, n2-n3 and n3-n4 like a chain. Actually I was changing the signal levels between nodes to simulate a random real-world situation. When signal levels are stable, the connection is good and stable. But when changing the "environment" to trigger a mpath change, it looks like the loop problem happens but not always.

To be more specific, I set up a chain link like PC1-->n1-->n2-->n3-->n4-->PC2 first and run iperf from PC1 to PC2. Meanwhile, I increase the attenuation between n1-n2 and n2-n3 but reduce the attenuation between n1-n3, hoping the mpath change to PC1-->n1-->n3-->n4-->PC2. Randomly the loop happens.

I have seen this happen in a 3 node system too. From the debug information it looked like a new path SN was trying to update the outdated mpath data structure with bigger SN and failed. Here the new path SN has the latest fresh mpath information while the old mpath information has old outdated mpath information but has a higher SN. So that the new SN couldn't update the old mpath.

It is hard to reproduce.

Jeff

Jeff

@bcopeland
Copy link
Contributor

bcopeland commented Feb 22, 2018 via email

@zhejunli
Copy link
Author

@bcopeland Yes this is with ath9k chip. The code base is from OpenWrt CC, kernel v3.10.36.

I don't know how OpenWrt synchronize with latest open80211s code base. Maybe the code in OpenWrt is too old?

Thanks.

@bcopeland
Copy link
Contributor

bcopeland commented Feb 23, 2018 via email

@chunyeow
Copy link
Contributor

chunyeow commented Feb 23, 2018 via email

@zhejunli
Copy link
Author

@chunyeow I confirm that patch is in place already.

@zhejunli
Copy link
Author

The mpath loop didn't happen and just let it alone.

Another question about the air metric. Is it possible to get a fresh metric of a "potential" mpath that is not being used currently?

Example, 3 mesh nodes n1,n2 and n3 are all in scope with each other. There is a iperf traffic from n1 directly to n3, n2 being a by stander. In this case, the TX rate and PER of n1--->n3 is always fresh and updated. However, because there's no traffic from n1 to n2, so that the air metric of n1--->n2 is never updated. I added a test frame mechanism so that n1 sends test frames to n2 and n3 periodically in order to maintain a fresh air metric of n1--->n2. But I found the rate control can not give a correct/reasonable TX rate of n1--->n2 only by this small amount of test frame traffic.

During the iperf test, I increased the attenuation between n1 and n3 hoping to switch the mpath from n1---->n3 to n1--->n2--->n3. But I found this caused a wrong mpath switch decision. Because the air metric of n1--->n2 and n2--->n3 are not fresh because of no traffic or too little test frames traffic.

My questions is:
Does 802.11s protocol do "better mpath selection" dynamically? Per my understanding it does. A mpath gets inactive periodically and a new mpath is formed. This mechanism makes sure the mpath is fresh and optimal. But if some potential mpathes' TX rate and PER were not updated properly before this time point, this new formed mpath will be wrong.

Thanks,

Jeff

@bcopeland
Copy link
Contributor

bcopeland commented Mar 26, 2018 via email

@zhejunli
Copy link
Author

@bcopeland Thanks for the reply. For the rate control algorithm, it doesn't care the HWMP path selection packets and it only care the data packets I think.

So I added a mechanism to send some test DATA frames as a background traffic. This suppose to train the rate control algorithm to keep a correct rate information to a specific peer but looks not enough. I still get wrong rate to a "POTENTIAL" peer. It maybe because of the RateControl behavior.

@zhejunli
Copy link
Author

zhejunli commented Apr 3, 2018

It is the Minstrel rate control that underestimates the TX rate to a POTENTIAL mpath peer. The test frames can not keep Minstrel to adjust to a proper, real rate. Looks like the test frames are too small traffic and only higher traffic can let Minstrel make a right decision.

@bcopeland
Copy link
Contributor

bcopeland commented Apr 3, 2018 via email

@zhejunli
Copy link
Author

zhejunli commented Apr 3, 2018

Yes that will help. And that involves the Minstrel part which I don't want to touch for the time being.

Per my understanding, 802.11s claims that it can DYNAMICALLY choose a better mpath by periodically deactivating an existing mpath and generating PREQ to search for a lower metric mpath. But, if a "POTENTIAL" mpath peer's rate ,PER status are outdated, the new path search will cause a fake "optimal" mpath.

Is this a defect of 802.11s protocol?

@zhejunli
Copy link
Author

Some tweaks are made and looks good so far. Still 2 major issues (I believe) left:

  1. authsae : key exchange may fail because of an ath9k key cache bug. This is a topic of another post. Don't know if using hostapd/supplicant for secure mesh will be better than authsae.
  2. Because of broadcasting nature, multi-hop scenario will cause the HWMP packet loss easily so a path is hard to be established.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants