[RFC 0/1] libertas: new scan logic

Wed Nov 28 16:32:41 EST 2007

On Wed, 2007-11-28 at 17:07 +0100, Holger Schurig wrote:
> > This is a problem with the firmware, in that it's buffers for
> > scan results apparently aren't big enough to hold everything. 
> > So you have to break it into groups, stopping on the
> > boundaries for 1, 6, 11 because it's expected that more APs
> > will be on the default channels than the intervening ones.
> 
> The other problem that I had was that I'm not able to send null 
> packets to the AP. So in order to not loose the connection, I 
> can only be this amount of time from the AP. By returning 
> shortly back, I won't loose the association.

Right; and we can't even tell if the firmware has null packet enabled
unless we whitelist firmware versions, because Marvell firmwares don't
do capability bits with that much granularity.

> At least a "ping a.b.c.d" to or from the box while "iwlist eth1 
> scanning" won't disassociate me with the worker logic. When I 
> disable the worker logic, e.g. by 
> 
> 
> #if 0
>                 /* somehow schedule the next part of the scan */
>                 if (chan_count &&
>                     !full_scan &&
>                     !priv->adapter->surpriseremoved) {
>                         adapter->last_scanned_channel += to_scan;
>                         cancel_delayed_work(&priv->scan_work);
>                         queue_delayed_work(priv->work_thread, 
> &priv->scan_work,
>                                 msecs_to_jiffies(300));
>                         goto out;
>                 }
> #endif
> 
> then I will be disassociated from the AP (again, while data flow 
> is happening, e.g. because of a ping).

Yeah, that's the other reason we put in the between-scan-wait times, to
ensure enough time for the firmware to return to the associated channel
so you don't get kicked off.

> > Then, the BSS list gets updated _and_ the WEXT GIWSCAN event
> > gets sent out after the _entire_ scan is complete, not after
> > each sub-scan is complete.
> 
> This is what I'm currently doing in the new scan.c. Just 
> let "iwevent" run while you're "iwlist eth1 scan" and while you 
> monitor syslog with "lbsdebug +enter +scan".
> 
> "iwlist eth1 scan" returns almost immediately, sometimes with
> "eth1      No scan results". It obviously doesn't care for the 
> GIWSCAN event that I'm sending. I think it just call GIWSCAN 
> followed by SIWSCAN. And that's ok so, it's the driver that 
> behaves weird.

If you want to make get_scan wait until you actually have scan results,
then you make get_scan return EAGAIN while a scan is in-flight until you
have the new results.  However, this isn't easy with that current
implementation, and it wasn't a high priority to make it work like this.

The whole point of switching to a cached-BSS list was to make sure that
get_scan() _didn't_ have to return EAGAIN and make callers wait.
Because if you have a cached BSS list, you have some results, and those
results are at _most_ 15 seconds old and therefore pretty
representative.  The only reason to have get_scan() return EAGAIN is to
make iwlist work on the first try.

> Note that once the scanning is finished (about 2 seconds later), 
> I can access the result via "cat $debugfs/getscantable". So the 
> scanner per-se is working, just not the transfer of the result.
> 
> To fix the behavior of "iwlist eth1 scan", I need to wait either 
> in lbs_get_scan() or lbs_set_scan() until the scan-worker is 
> finished. I'm just unsure about the best way this can be done, 
> my kernel-foo isn't good enought. Poll until the worker thread 
> is finished, wait until last_scanned_channel is 0 again, 
> something like this.

No waiting.  Use the EAGAIN trick, and iwlist should be smart enough to
keep polling for at least a bit until some timeout occurs.  Please don't
block in the driver, and _certainly_ don't block in set_scan().
Blocking there is really, really evil.

> > 1) the BSS list gets updated whenever scan results are
> > available and a GIWSCAN event gets sent, which is better for
> > longer-lived tools like NetworkManager and wpa_supplicant, or
> 
> I think this sucks because some scans delete the list. If I send 

No scans should actually delete the scan list.  That's cruft left over
from the original driver bits.  If the card wants to delete results from
certain BSSs or SSIDs, it should do that, but it should never blow away
the _entire_ scan list.  Driver bug.

> several GIOWSCAN events before finished, then the longer-lived 
> tool won't get the full picture of what's "in the air" for the 
> first three GIOWSCAN events (assuming 14 channels and doing 4 
> channels at a time).

Well, tools like NM cache the results themselves and age the results
differently than the driver, because the driver shouldn't have that sort
of aging policy in it.  From the NM point of view, the user wants a
stable network menu, and most of the time a device doesn't move so
quickly that networks change hugely in a matter of minutes.  So NM
smooths over this, but it's still wrong in the driver.

> Also, I don't know any other driver that sends several GIWSCAN 
> events for one scan request. That's a second reason why I don't 
> want to do something that is unusual.

Doesn't matter; decouple the idea of GIWSCAN from SIWSCAN.  You should
send a GIWSCAN event whenever you get new BSS information.  For example,
if the card does background scanning, even if just on the associated
channel (like ipw cards do), then you need to send a GIWSCAN event for
each time the BSS list changes.  They are not explicitly tied together.
Tools like iwlist don't even care about events, so they don't matter
here.  Moreover, your app could get a GIWSCAN because some other app
triggered a scan.  Two apps can trigger successive scans too, and you'll
only get one GIWSCAN event at the end of it anyway.  There is not a 1:1
relationship between a SIWSCAN request and GIWSCAN event.

> > 2) The scan results are cached until the entire scan
> > completes, then the BSS list gets updated and the GIWSCAN
> > event is sent, which works short-lived tools like iwlist but
> > sucks for programs that certainly _can_ use the scan results
> > as they come in, not in a batch after (potentially) a 10
> > second wait.
> 
> But if those programs can use the scan results as they come in, 
> then they can also work with scenario 1. In fact, they must work 
> with them, because this is how many other drivers work.

Yes; apps still have to deal with (1) because the app is certainly not
ever the only app touching the wireless driver.  So if your app handles
(1), it will also handle (2).

> > I'm inclined to say that what iwlist should really do is to
> > keep gathering scan results for up to 10 seconds or until it
> > sees an AP on the last channel in the regulatory domain for
> > the driver, whichever happens first.
> 
> Changing "iwlist" is not an option for me. I don't see it broken. 
> I have used iwlist with 7 different drivers, not not observed a 
> problem. It simply worked for my for several years. It's the 
> driver, that can/should fix the current problem, not iwlist.
> 
> Also it would sucks to wait, say, 10 seconds, when the scan took 
> only, say 2 seconds. Arbitrary timeouts are seldom good.

Right, but _blocking_ in the driver is just _wrong_ _wrong_ _wrong_.
The driver must never block in SIWSCAN or GIWSCAN.  SIWSCAN returns
immediately (with an error of course if scan cannot be triggered).
GIWSCAN needs to return EAGAIN if it decides that scan results aren't
available during an in-flight scan.  It cannot block until the scan is
done.

Dan