[PATCH 0/1] RFC: memory coherency and data races on TX path

Daniel Mack daniel at zonque.org
Tue Apr 10 10:42:44 PDT 2018


I found something that I believe might be an issue, and I have an
idea on how to correct this, but unfortunately, this doesn't solve
the issues I occasionally see with this driver. I'd still like to 
share it, because I might be totally mistaken in my understanding.

With no firmware code or documentation at hand, it's not exactly clear
which assumption the firmware makes, but obviously, the driver and the
firmware share memory to exchange descriptors that either contain
control information or payload. The driver puts control descriptors
and payload descriptors in a ring buffer in an interleaved fashion.

When the driver wants to send an skb, it looks for a currently unused
control descriptor, and then fills it, together with its directly
chained payload descriptor. The descriptors are both marked valid and
then the firmware is instructed to start processing the ringbuffer.
In case the firmware is idle when wcn36xx_dxe_tx_frame() is entered,
this is all fine. However, when sending many packets in a short time
frame, there are cases when the firmware is still processing the ring
buffer (iow, ch->reg_ctrl & WCN36xx_DXE_CH_CTRL_EN != 0), and in this
case, writes to the shared memory area depict a data race. The local
spinlock of course doesn't help to prevent that. OTOH, it should be
completely fine to modify the descriptors while firmware is still
reading them, as the firmware should only pay attention to such that
are marked valid.

There's a problem with the latter presumption however which looks like
this in the driver code:

	desc->fr_len = ctl->skb->len;
	/* set dxe descriptor to VALID */
	desc->ctrl = ch->ctrl_skb;

The CPU may very well reorder the two writes, even though the memory is
allocated as coherent DMA. When that happens, the firmware may see a
wrong length for a valid descriptor. A simple memory write barrier would
suffice to solve this, but then again, there are two descriptors
involved.

My attempt to fix that restructures the code a bit and makes the
payload descriptor valid first and then the control descriptor, both
strongly ordered through memory fences. This however assumes that the
firmware will ignore valid payload descriptors unless they have a
valid control descriptor preceding them, but that's really just
guessing.

Does that make sense? As I said, I can't really say this improves
anything, sadly, so I might be mistaken entirely. But I'll leave this
here for further discussion. Ideally, somebody with access to the
firmware sources could give an assessment whether this is an issue at
all or not.


Thanks,
Daniel


Daniel Mack (1):
  wcn36xx: fix buffer commit logic on TX path

 drivers/net/wireless/ath/wcn36xx/dxe.c | 75 +++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 37 deletions(-)

-- 
2.14.3




More information about the wcn36xx mailing list