[PATCH 7/8] GAS: End remain-on-channel due to delayed GAS comeback request

Mon Dec 21 00:12:08 PST 2015

> -----Original Message-----
> From: Jouni Malinen [mailto:j at w1.fi]
> Sent: Sunday, December 20, 2015 19:56
> To: Peer, Ilan
> Cc: hostap at lists.infradead.org; Gottlieb, Matti
> Subject: Re: [PATCH 7/8] GAS: End remain-on-channel due to delayed GAS
> comeback request
> 
> On Sun, Dec 20, 2015 at 12:57:36PM +0000, Peer, Ilan wrote:
> > We did not think that this would be an issue as we assumed that after
> > the GAS initial request/response exchange all the comeback requests
> > are negotiated directly with the AP, without involving the
> > advertisement server, so the exchange should be fast enough to complete
> in 200 msec.
> 
> Well, if things were perfect, sure, but.. There are bad AP implementations and
> there are environments where it can be difficult to get a long, fragmented
> GAS exchange through. On a busy 2.4 GHz channel, it can take a while to get a
> chance to transmit a frame and the likelihood of getting multiple 1500 byte
> frames through at 1 Mbps with some interference drops quite a bit with
> interference. I've been to lab environments where it was very difficult to
> complete fragmented GAS exchanges reliable; never mind trying to do this in
> a way that each frame has a maximum of 200 ms to make it through..
> 
> > FWIW, in our testing setups we also used 100 msec which was also ok,
> > however, these are only testing setup, so we could still might issues
> > in real deployments :)
> 
> I don't know what to expect in practical deployments, but I picked semi-
> randomly a value between these: 150 ms. Or well, it was not really that
> randomly, since it happened to be the value that made the existing hwsim test
> case pass with MCC enabled.. :)
> 

hehe ... 150 does not really sound random to begin with :)

> > In case that all the wait times are equal, the first wait would never
> > be extended, so eventually we will always need to pay the wait time
> > between ROCs. As an alternative we also considered to always cancel
> > the previous running ROC before starting a new one, but this has the
> > disadvantage that scheduling a new ROC can once again incur additional
> > delays, so we decided to go with the approach in patch 8/8. We can revert
> to this approach if you think that it is safer in terms of inter-op.
> 
> This patch 8/8 has a bug caused by patch 7/8, i.e., it does not really do what
> you describe here.. Because of 7/8 terminating the first offchannel wait (the
> only one with the longer wait time), the first comeback request would start a
> new ROC with the shorter wait time and every following comeback request
> would use that same wait time and without ROC extension, that would result in
> the exact same issue.. Just the wait time is shorter (200 vs. 1000 ms in these
> patches).
> 

Missed that fact that comeback delay is also set for initial response.

> I fixed that by keeping the query->offchannel_tx_started tracking up-to-date
> with patch 7/8 behavior and using the longer wait time for the first comeback
> request if the initial wait time had been canceled (which it really is in every
> single case now, but that could be modified to consider the fragmentation-
> without-wait case with very short comeback delay to skip stopping the initial
> ROC). This provides significant further speedup when both patches 7 and 8
> are applied.
> 
> To make it acceptable to test with shorter wait time first, I added a mechanism
> to retry full GAS sequence if any waits for a comeback response fail. This
> second attempt will use the old timeout of 1000 ms.
> With this, the end result is actually more robust than the previous design and
> significantly faster for the fragmented case with drivers that cannot extend
> pending ROC. I haven't yet pushed this into the master branch, but if nothing
> unexpected shows up, I'll probably do so.
> 

Thanks,

Ilan.