Skip to content

Mesh Synchronization Implementation Notes

jcard0na edited this page Apr 6, 2012 · 42 revisions

Table of Contents

  1. Specification
  2. Files
  3. Framework
  4. Offset Tracking
  5. Locking
  6. Timing Reference
  7. Drift Adjustment
  8. Workqueue
  9. [[iw|Mesh-Synchronization---Implementation-Notes#wiki-iw]]
  10. Testing
  11. Known Issues
  12. Credits

Where is it specified

Mesh Synchronization is specified in the IEEE 802.11 Standard. For our implementation, we've used IEEE Draft P802.11-REVmbTM/D12, November 2011.

Where is it implemented

There have been changes required to nl80211, cfg80211 and mac80211. But the bulk of the mesh synchronization takes place in net/mac80211/mesh_sync.c. This should be your starting point if you intend to review and/or extend this code.

Framework

The 802.11s specification defines an extensible mesh synchronization framework. To achieve this extensibility, the framework has been implemented using the common operations table pattern frequently used in the kernel.

The table is an array of struct ieee80211_mesh_sync_ops which provides definitions for the following functions (operations):

struct ieee80211_mesh_sync_ops {
        void (*rx_bcn_presp)(...);
        void (*adjust_tbtt)(...);
};

The functions will be called by the open80211s protocol at different times as required to track timing offset and correct drift.

  • rx_bcn_presp() - this is called every time a mesh beacon is received.
  • adjust_tbtt() - this is called immediately before a beacon is about to be transmitted.

Offset tracking

Per-peer offset tracking takes place inside mesh_sync_offset_rx_bcn_presp(). The function is passed an struct ieee80211_rx_status that contains the field (.mactime) set to the local tsf value recorded at the reception of the first symbol of the beacon. If the beacon comes from a known station (either an established peer or a peer candidate), the t_offset for that station (sta) is recorded in ```sta->t_offset``.

Once the t_offset for a particular peer is known, the flag WLAN_STA_TOFFSET_KNOWN is set on the station so that the can be reported to userspace (e.g. iw).

Locking

When you review the code you will surely be tempted to re-order some statements to optimize the offset calculations: think twice before doing so. In particular, you have to consider that any access to the sta pointer must take place within the rcu_read_lock/unlock section, and also one cannot call any functions that might sleep within that section. That includes the evident spin_lock() calls but also the calls to the driver to get or set the TSF counter (drv_get/set_tsf()).

Timing Reference

The specification states that [ T_{offset} ] should be computed by comparing the timestamps of received beacons with the "frame reception time" (see the definition of [ T_r ] in 13.13.2.2.2). But "frame reception" is an ambiguous term.

In mac80211, the driver provides the mactime for each received frames, which is defined as the time the first data symbol of the frame hits the PHY. The timestamp and the mactime do not correspond to the same instants in time. The timestamp of the beacon is defined as "the time that the data symbol containing the first bit of the timestamp is transmitted to the PHY plus the transmitting STA's delays through its local PHY from the MAC-PHY interface to its interface with the WM" (802.11 11.1.2)

In other words, if two nodes had perfectly synchronized clocks, the difference between mactime and the beacon timestamp would not be zero. This difference would depend on the beacon transmission rate, which should be constant for any given MBSS. This constant error would have no effect in clock drift calculations, though, as it would be canceled out. But would impact other uses of mesh synchronization like MCCA or power save.

Due to that, we have considered "frame reception time" to mean the time at which the first bit of the time stamp is received. With this definition, [ T_r ] becomes:

/* 24 bytes of header * 8 bits/byte * 10 Kbps/Mbps / rate in Kbps */
t_r = rx_status->mactime + (24 * 8 * 10 / rate);

Drift Adjustment

The function mesh_sync_offset_rx_bcn_presp() will record the maximum drift for all the stations and store it in the mesh interface variable ifmsh->sync_offset_clockdrift_max. Note there is only one such variable per mesh interface, and it will store the maximum clock drift for all the stations at any given time.

In a different execution thread, every time a beacon is about to be sent, the function mesh_sync_offset_adjust_tbtt() will check if the sync_offset_clockdrift_max exceeds a certain threshold. The specification does not mention this minimum adjustment threshold (TBTT_MINIMUM_ADJUSTMENT) but it is necessary to prevent TSF adjustments that are smaller than the minimum TSF adjustment latency. We are currently using a value of [ 10 \mu{}s] for this threshold, and this may probably have to be revisited in the future. This function is also responsible for determining if the adjustment is small enough to be applied all at once or gradually applied over several beacons. When the adjustment is applied over several beacons, each correction is 0.04% of the beacon interval. See the code here.

TSF Adjustment Workqueue

The function mesh_sync_offset_adjust_tbtt() is invoked from within an rcu_read section, and therefore it is not possible to make calls to the driver tsf functions from within that function. To resolve this we use the mesh workqueue which is only scheduled if the function determines that a TSF adjustment is needed.

The call that schedules the workqueue is set_bit(MESH_WORK_DRIFT_ADJUST, &ifmsh->wrkq_flags);. And the handler that is invoked when the work queue is scheduled is on that same file, mesh_sync_offset_adjust_tbtt. The workqueue handler is scheduled in process context and therefore is allowed to make any driver calls it requires.

iw

You could view the timing offset (Toffset) of each peer by doing a station dump.

$ iw $MESH_IFACE station dump 
Station 00:03:7f:10:4e:0c (on mesh0)
	inactive time:	356 ms
	rx bytes:	28190450
	rx packets:	696028
	tx bytes:	497
	tx packets:	7
	tx retries:	1
	tx failed:	0
	signal:  	-38 dBm
	signal avg:	-36 dBm
------>	Toffset:	-4451259 us  <------
	tx bitrate:	1.0 MBit/s
	mesh llid:	20314
	mesh plid:	38652
	mesh plink:	ESTAB
	authorized:	yes
	authenticated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		no
	TDLS peer:	no

The Toffset: ... line is only visible on stations whose offset is tracked. On other stations, or if synchronization is disabled, the line is not shown.

Neighbor offset synchronization is defined as the default mandatory synchronization method. A vendor may implement different synchronization method using this framework to meet special application needs. To switch to vendor specific modes use the (vendor_sync) option when joining the mesh.

iw $MESH_IFACE mesh join $MESH_ID vendor_sync on

Currently this option just disables the default neighbor offset synchronization and prints messages to the kernel log when the different handlers are installed. It is useful only for testing and as example code.

Testing

Testing synchronization can get tricky on a real mesh network. To be able to test synchronization in a controlled environment, we modified the wireless driver simulator (mac80211_hwsim) to report TSF time based on the kernel high-resolution clock. We also introduced the ability to independently change the offset of each radio instantiated by mac80211_hwsim so we could introduce drift and observe the mesh synchronization mechanism in action.

We also introduced a new kernel configuration option CONFIG_MAC80211_VERBOSE_MESH_SYNC_DEBUG that will dump a lot real-time information about mesh synchronization.

Once you have convinced yourself that mesh synchronization works you can try it on real hardware. To manually introduce drift you can use the new tsf control variable via debugfs.

To introduce small TSF jumps, you can use the following syntax:

echo +=000002000 > /sys/kernel/debug/ieee80211/phy0/netdev\:mesh0/tsf

or

echo -=000002000 > /sys/kernel/debug/ieee80211/phy0/netdev\:mesh0/tsf

Known Issues

  1. When testing on ath9k hardware we have observed sporadic TSF resets. Apparently this is caused by a known ath9k issue that is not fully resolved. When that happens, you might see the following error in your kernel logs: "Failed to stop TX DMA, queues=0x001!". Our implementation of the mesh synchronization algorithm detects that situation and adjusts the Toffset setpoint accordingly.

  2. The TSF adjustment latency has been measured to be around 2-3 microseconds and this provokes an overcorrection in TSF. To counter this effect, we introduced a constant margin correction to the setpoint (see this commit) This latency may vary across different platforms, and maybe the correction may need to be tuned.

  3. Currently only neighbors in the current MBSS are tracked. See this issue

Credits

This work was a collaboration between Marco Porsch, Pavel Zubarev and the friendly folks at cozybit Inc.

Clone this wiki locally