[PATCH 7/8] GAS: End remain-on-channel due to delayed GAS comeback request

Sun Dec 20 09:55:48 PST 2015

On Sun, Dec 20, 2015 at 12:57:36PM +0000, Peer, Ilan wrote:
> We did not think that this would be an issue as we assumed that after the GAS
> initial request/response exchange all the comeback requests are negotiated directly
> with the AP, without involving the advertisement server, so the exchange should be
> fast enough to complete in 200 msec.

Well, if things were perfect, sure, but.. There are bad AP
implementations and there are environments where it can be difficult to
get a long, fragmented GAS exchange through. On a busy 2.4 GHz channel,
it can take a while to get a chance to transmit a frame and the
likelihood of getting multiple 1500 byte frames through at 1 Mbps with
some interference drops quite a bit with interference. I've been to lab
environments where it was very difficult to complete fragmented GAS
exchanges reliable; never mind trying to do this in a way that each
frame has a maximum of 200 ms to make it through..

> FWIW, in our testing setups we also used 100 msec which was also ok, however,
> these are only testing setup, so we could still might issues in real deployments :)

I don't know what to expect in practical deployments, but I picked
semi-randomly a value between these: 150 ms. Or well, it was not really
that randomly, since it happened to be the value that made the existing
hwsim test case pass with MCC enabled.. :)

> In case that all the wait times are equal, the first wait would never be extended, so 
> eventually we will always need to pay the wait time between ROCs. As an alternative we
> also considered to always cancel the previous running ROC before starting a new one, but this
> has the disadvantage that scheduling a new ROC can once again incur additional delays,
> so we decided to go with the approach in patch 8/8. We can revert to this approach
> if you think that it is safer in terms of inter-op.

This patch 8/8 has a bug caused by patch 7/8, i.e., it does not really
do what you describe here.. Because of 7/8 terminating the first
offchannel wait (the only one with the longer wait time), the first
comeback request would start a new ROC with the shorter wait time and
every following comeback request would use that same wait time and
without ROC extension, that would result in the exact same issue.. Just
the wait time is shorter (200 vs. 1000 ms in these patches).

I fixed that by keeping the query->offchannel_tx_started tracking
up-to-date with patch 7/8 behavior and using the longer wait time for
the first comeback request if the initial wait time had been canceled
(which it really is in every single case now, but that could be modified
to consider the fragmentation-without-wait case with very short
comeback delay to skip stopping the initial ROC). This provides
significant further speedup when both patches 7 and 8 are applied.

To make it acceptable to test with shorter wait time first, I added a
mechanism to retry full GAS sequence if any waits for a comeback
response fail. This second attempt will use the old timeout of 1000 ms.
With this, the end result is actually more robust than the previous
design and significantly faster for the fragmented case with drivers
that cannot extend pending ROC. I haven't yet pushed this into the
master branch, but if nothing unexpected shows up, I'll probably do so.

-- 
Jouni Malinen                                            PGP id EFC895FA