b43 error under heavy load
Chris Vine
chris at cvine.freeserve.co.uk
Wed Jun 1 13:49:31 EDT 2011
On Wed, 1 Jun 2011 17:42:55 +0200
Rafał Miłecki <zajec5 at gmail.com> wrote:
> 2011/6/1 Chris Vine <chris at cvine.freeserve.co.uk>:
> > On Wed, 1 Jun 2011 14:25:23 +0200
> > Rafał Miłecki <zajec5 at gmail.com> wrote:
> >> I think you should easily get this error by transmitting. Streaming
> >> some video is mostly receiving. Just putting some random
> >> (ftp/sftp/iperf) server in the network would make the trick.
> >
> > OK rather than recompile the kernel with debugfs enabled, as you
> > suggested I took the debugging call to b43dbg() out of the
> > B43_DBG_DMAVERBOSE if block (so it is entered whether or not verbose
> > debugging is set).
> >
> > I transferred a 268MB file across the LAN in a little over 5 minutes
> > (so the transfer speed was a little under 10Mb/s). During the
> > course of the transfer I got about 500 "b43-phy0 debug: Stopped TX
> > ring 1" statements logged.
> >
> > However the interesting thing is that with this debugging statement
> > included, I got no messages about any out of order TX status;
> > instead, apart from the overrun messages to the debug log, the link
> > performed normally. The file transfer did not fail (I have checked
> > that the file received is identical to the file sent) nor was the
> > link to the router lost. Probably writing to the debug log has
> > changed some timing race somewhere to the benefit of link integrity.
> >
> > However, as I said, I am not going to be in a position to do much
> > testing by way of transferring further files over the LAN for a
> > period of time, for unrelated reasons.
>
> Well, it just seems that hitting full TX ring does not cause firmware
> problems and out of order issue. However I've no idea what else can
> cause it. We were also seeing this issue with free firmware on G-PHY
> cards.
>
> Maybe this is some hardware issue firmware has to workaround? Maybe
> updating firmware could help? My next idea is to try 508.1107
> firmware.
My earlier report didn't test in both directions (it was for a netbook
to desktop transfer). I have now made a number of transfers in both
directions of a 268MB file using sftp, and the results are below.
Both ends have sshd installed and running, and it is the sending machine
whose ssh daemon is active for the transfer in question: in other
words, all the transfers are get rather than put operations.
My netbook is the computer with the broadcom wireless device. My
desktop doesn't use wireless: it is connected to the router via 100Mb/s
ethernet. Therefore, the transfer speeds are limited by the wireless
link to the netbook rather than the ethernet link to the desktop.
The transfers from the netbook to the desktop computers took an average
of 4 mins 28 secs, and on each occasion the file transfer completed
successfully and the wireless link stayed up, although I got the
repeated reports of "b43-phy0 debug: Stopped TX ring 1" to which I have
earlier referred.
The transfers from the desktop to the netbook, when they succeeded, were
faster, taking an average of 2 mins 35 secs (this is surprisingly
quick for a 268MB file). Of the three transfer attempts I made, two
succeeded, with no error messages or any kind reported to the debug
log, and one failed. The one which failed caused the cessation of
wireless traffic, and was accompanied by the debug log reports of out
of order TX status earlier referred to, and with only one single report
in the debug log of "Stopped TX ring 1". In the case of the failed
transfer, I brought the wireless link back up by disassociating and
then reassociating with the AP/router. It was not necessary to unload
and reload the b43 module, so there was no hard error involved.
Summary: Traffic sent up from the broadcom wireless device generates
copious reports of "Stopped TX ring 1" but always carries on with its
job and stays up, although its traffic is slower than on received
packets. Received traffic on the other hand reports no errors until
the spate of "Out of order TX status report on DMA ring 1" errors
occurs, which seems to happen at random (albeit accompanied on my
failed transfer by a single "Stopped TX ring 1" log entry), and when it
does happen brings the wireless link to a halt. Wireless traffic can be
restarted simply by reassociating with the AP.
With that, I am afraid that really is it for a few days.
Chris
More information about the b43-dev
mailing list