SDIO Performance once again

Dan Williams dcbw at redhat.com
Mon Feb 9 12:10:55 EST 2009


On Mon, 2009-02-09 at 11:30 -0500, Jeff Sutherland wrote:
> On Monday 09 February 2009, Dan Williams wrote:
> > On Mon, 2009-02-09 at 08:21 -0500, Jeff Sutherland wrote:
> > > On Monday 09 February 2009, Sven Neumann wrote:
> > > > Hi,
> > > >
> > > > On Sun, 2009-02-08 at 17:56 +0100, Dominik S. Herwald wrote:
> > > > > right now I am testing Marvell 8686 based Modules connected to the
> > > > > SDIO Controller of a Blackfin BF548.
> > > > >
> > > > > Basically the libertas driver works just fine and stable.
> > > > > But the Performance... :-/
> > > >
> > > > Interesting. We are having the contrary experience here. Running a
> > > > CM-X300 board, which is a PXA 300 featuring a Marvell 8686 module
> > > > connected to the SDIO controller, we are seeing a performance of about
> > > > 13 Mbits/sec.
> > > >
> > > > Unfortunately this is not stable. Sometimes (rarely on one of the two
> > > > boards we have, frequently on the other), there are errors:
> > > >
> > > > libertas: tx watch dog timeout
> > > >
> > > > When this happens it usually takes about six seconds for the module to
> > > > recover. During this period nothing is transmitted. At some point,
> > > > sooner or later, the following errors shows up:
> > > >
> > > > libertas: command 0x001f timed out
> > > > libertas: requeueing command 0x001f due to timeout (#1)
> > > > libertas: command 0x001f timed out
> > > > libertas: requeueing command 0x001f due to timeout (#2)
> > > > libertas: command 0x001f timed out
> > > > libertas: requeueing command 0x001f due to timeout (#3)
> > > > libertas: tx watch dog timeout
> > > > libertas: command 0x001f timed out
> > > > libertas: Excessive timeouts submitting command 0x001f
> > > >
> > > > At this point the module stops to work completely and I haven't found a
> > > > way to recover from this. The system then needs to be restarted.
> > > >
> > > > We are using linux-2.6.29-rc4 currently, but observed the same problems
> > > > with linux-2.6.26. The firmware version is 8.73.7p3. I have also tried
> > > > the 9.70.3 firmware, that didn't help either.
> > > >
> > > > Has anyone experienced similar problems? Any ideas what I could try to
> > > > get some more debug output from the driver?
> > > >
> > > >
> > > > Sven
> > >
> > > This looks to be caused by the same situation that affects my pxa270
> > > design with an SDIO connected 8686 chipset.  Because of the
> > > multi-threading in the driver, depending on cpu load, commands to the
> > > chip are sent in an indeterminate sequence.  If you go turn on some
> > > debugging in the driver, such
> >
> > The firmware can only execute one command at a time, so all commands are
> > stuffed into the internal command queue.  Since some commands cannot be
> > sent during PS periods, there needs to be some authority to handle that.
> > Much of this general structure is left over from the 2006 Marvell vendor
> > driver dump that Libertas was derived from.
> >
> > > as putting a printk someplace, that is usually enough of a load to pace
> > > thread execution so the commands usually execute in the same sequence and
> > > you'll see this error go away (along with the network performance :-(
> > > Basically the 8686 firmware gets its knickers in a twist if it sees
> > > certain commands executed out of sequence.  And rightly so.  There's some
> > > fundamental redesign of the driver that needs doing that's beyond my
> > > programming interests and abilities.
> >
> > I'm not aware of any specific situations (besides WPA setup) where
> > commands need to be executed in a specific sequence; if there were, we
> > can certainly handle that in the driver.
> >
> > Other than that, any commands that are submitted are done so at the
> > request of the user via 'iwconfig' or wpa_supplicant; these are queued
> > atomically into the main command queue.
> >
> > Much of the association process (including the worker and the back-off
> > timer stuff in assoc.c) was necessitated by the broken design of WEXT;
> > with cfg80211 I expect we can go back to a much simpler model whereby
> > the entire association request is simply sent to the firmware in one
> > shot.  There could be some cleanup opportunities here; ideally the
> > *only* communication with the firmware happens when finally setting the
> > parameters instead of the current behavior of some WEXT calls asking for
> > values before queuing up the attributes to set later.
> >
> > What sort of suggestions would you have for redesign?
> >
> > Dan
> 
> Well, like I said earlier, that's really beyond my expertise as I'm a HW 
> person, but when I was doing the debugging of my problems, I couldn't help 
> but notice that when I was 'printk'ing' various bits and pieces of the driver 
> that the command sequence was always the same, regardless of CPU loading, and 
> that the module always worked.  Without the printk's but with driver 
> debugging still turned on, I noticed that the command sequence would be 
> different for the times when the firmware wedged.  So my only suggestions are 
> ensure that commands are not sent to the module until the previous command 
> has finished executing, and maybe ensure that the sequence of commands is 
> always preserved somehow.

The driver should always guarantee only one in-flight command at a time.
If the command timeouts are too short, then yes, it may be the case that
a command is sent to the firmware before a previous one completes, and
then we need to increase the command timeout.

However, after the initial firmware load, command sequences will vary
depending on the requests made from userspace via 'iwconfig' and
whatever; and the firmware *should* be able to handle this.  Since most
of the commands don't have interdependencies (with the exception of WPA
setup) AFAIK everything should be OK at the moment.

Most of the command timeouts were determined experimentally from the USB
interface driver, and that might well be too short for the SDIO and SPI
drivers.  Need some experimentation there if it turns out command
timeouts are really the problem?

Dan




More information about the libertas-dev mailing list