-
Notifications
You must be signed in to change notification settings - Fork 54
Mesh Synchronization Implementation Notes
- Specification
- Files
- Framework
- Offset Tracking
- Locking
- Timing Reference
- Drift Adjustment
- Workqueue
- [[
iw
|Mesh-Synchronization---Implementation-Notes#wiki-iw]] - Testing
- Known Issues
- Credits
Mesh Synchronization is specified in the IEEE 802.11 Standard. For our implementation, we've used IEEE Draft P802.11-REVmbTM/D12, November 2011.
There have been changes required to nl80211
, cfg80211
and mac80211
. But the bulk of the mesh synchronization takes place in net/mac80211/mesh_sync.c
. This should be your starting point if you intend to review and/or extend this code.
The 802.11s specification defines an extensible mesh synchronization framework. To achieve this extensibility, the framework has been implemented using the common operations table pattern frequently used in the kernel.
The table is an array of struct ieee80211_mesh_sync_ops
which provides definitions for the following functions (operations):
struct ieee80211_mesh_sync_ops {
void (*rx_bcn_presp)(...);
void (*adjust_tbtt)(...);
};
The functions will be called by the open80211s
protocol at different times as required to track timing offset and correct drift.
- rx_bcn_presp() - this is called every time a mesh beacon is received.
- adjust_tbtt() - this is called immediately before a beacon is about to be transmitted.
Per-peer offset tracking takes place inside mesh_sync_offset_rx_bcn_presp()
. The function is passed an struct ieee80211_rx_status
that contains the field (.mactime
) set to the local tsf value recorded at the reception of the first symbol of the beacon. If the beacon comes from a known station (either an established peer or a peer candidate), the t_offset
for that station (sta
) is recorded in ```sta->t_offset``.
Once the t_offset
for a particular peer is known, the flag WLAN_STA_TOFFSET_KNOWN
is set on the station so that the can be reported to userspace (e.g. iw
).
When you review the code you will surely be tempted to re-order some statements to optimize the offset calculations: think twice before doing so. In particular, you have to consider that any access to the sta
pointer must take place within the rcu_read_lock/unlock
section, and also one cannot call any functions that might sleep within that section. That includes the evident spin_lock()
calls but also the calls to the driver to get or set the TSF counter (drv_get/set_tsf()
).
The specification states that [ T_{offset} ] should be computed by comparing the timestamps of received beacons with the "frame reception time" (see the definition of [ T_r ] in 13.13.2.2.2). But "frame reception" is an ambiguous term.
In mac80211, the driver provides the mactime
for each received frames, which is defined as the time the first data symbol of the frame hits the PHY. The timestamp and the mactime
do not correspond to the same instants in time. The timestamp of the beacon is defined as "the time that the data symbol containing the first bit of the timestamp is transmitted to the PHY plus the transmitting STA's delays through its local PHY from the MAC-PHY interface to its interface with the WM" (802.11 11.1.2)
In other words, if two nodes had perfectly synchronized clocks, the difference between mactime
and the beacon timestamp would not be zero. This difference would depend on the beacon transmission rate, which should be constant for any given MBSS. This constant error would have no effect in clock drift calculations, though, as it would be canceled out. But would impact other uses of mesh synchronization like MCCA or power save.
Due to that, we have considered "frame reception time" to mean the time at which the first bit of the time stamp is received. With this definition, [ T_r ] becomes:
/* 24 bytes of header * 8 bits/byte * 10 Kbps/Mbps / rate in Kbps */
t_r = rx_status->mactime + (24 * 8 * 10 / rate);
The function mesh_sync_offset_rx_bcn_presp()
will record the maximum drift for all the stations and store it in the mesh interface variable ifmsh->sync_offset_clockdrift_max
. Note there is only one such variable per mesh interface, and it will store the maximum clock drift for all the stations at any given time.
In a different execution thread, every time a beacon is about to be sent, the function mesh_sync_offset_adjust_tbtt()
will check if the sync_offset_clockdrift_max
exceeds a certain threshold. The specification does not mention this minimum adjustment threshold (TBTT_MINIMUM_ADJUSTMENT
) but it is necessary to prevent TSF adjustments that are smaller than the minimum TSF adjustment latency. We are currently using a value of [ 10 \mu{}s] for this threshold, and this may probably have to be revisited in the future. This function is also responsible for determining if the adjustment is small enough to be applied all at once or gradually applied over several beacons. When the adjustment is applied over several beacons, each correction is 0.04% of the beacon interval. See the code here.
The function mesh_sync_offset_adjust_tbtt()
is invoked from within an rcu_read
section, and therefore it is not possible to make calls to the driver tsf functions from within that function. To resolve this we use the mesh workqueue which is only scheduled if the function determines that a TSF adjustment is needed.
The call that schedules the workqueue is set_bit(MESH_WORK_DRIFT_ADJUST, &ifmsh->wrkq_flags);
.
And the handler that is invoked when the work queue is scheduled is on that same file, mesh_sync_offset_adjust_tbtt
. The workqueue handler is scheduled in process context and therefore is allowed to make any driver calls it requires.
You could view the timing offset (Toffset) of each peer by doing a station dump.
$ iw $MESH_IFACE station dump
Station 00:03:7f:10:4e:0c (on mesh0)
inactive time: 356 ms
rx bytes: 28190450
rx packets: 696028
tx bytes: 497
tx packets: 7
tx retries: 1
tx failed: 0
signal: -38 dBm
signal avg: -36 dBm
------> Toffset: -4451259 us <------
tx bitrate: 1.0 MBit/s
mesh llid: 20314
mesh plid: 38652
mesh plink: ESTAB
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
The Toffset: ...
line is only visible on stations whose offset is tracked. On other stations, or if synchronization is disabled, the line is not shown.
Neighbor offset synchronization is defined as the default mandatory synchronization method. A vendor may implement different synchronization method using this framework to meet special application needs. To switch to vendor specific modes use the (vendor_sync
) option when joining the mesh.
iw $MESH_IFACE mesh join $MESH_ID vendor_sync on
Currently this option just disables the default neighbor offset synchronization and prints messages to the kernel log when the different handlers are installed. It is useful only for testing and as example code.
Testing synchronization can get tricky on a real mesh network. To be able to test synchronization in a controlled environment, we modified the wireless driver simulator (mac80211_hwsim
) to report TSF time based on the kernel high-resolution clock. We also introduced the ability to independently change the offset of each radio instantiated by mac80211_hwsim
so we could introduce drift and observe the mesh synchronization mechanism in action.
We also introduced a new kernel configuration option CONFIG_MAC80211_VERBOSE_MESH_SYNC_DEBUG
that will dump a lot real-time information about mesh synchronization.
Once you have convinced yourself that mesh synchronization works you can try it on real hardware. To manually introduce drift you can use the new tsf
control variable via debugfs.
To introduce small TSF jumps, you can use the following syntax:
echo +=000002000 > /sys/kernel/debug/ieee80211/phy0/netdev\:mesh0/tsf
or
echo -=000002000 > /sys/kernel/debug/ieee80211/phy0/netdev\:mesh0/tsf
-
When testing on
ath9k
hardware we have observed sporadic TSF resets. Apparently this is caused by a known ath9k issue that is not fully resolved. When that happens, you might see the following error in your kernel logs: "Failed to stop TX DMA, queues=0x001!". Our implementation of the mesh synchronization algorithm detects that situation and adjusts the Toffset setpoint accordingly. -
The TSF adjustment latency has been measured to be around 2-3 microseconds and this provokes an overcorrection in TSF. To counter this effect, we introduced a constant margin correction to the setpoint (see this commit) This latency may vary across different platforms, and maybe the correction may need to be tuned.
-
Currently only neighbors in the current MBSS are tracked. See this issue
This work was a collaboration between Marco Porsch, Pavel Zubarev and the friendly folks at cozybit Inc.