[PATCH 0/1] RFC: memory coherency and data races on TX path

Ramon Fried ramon.fried at linaro.org
Tue Apr 10 11:11:08 PDT 2018


On 10 April 2018 at 20:42, Daniel Mack <daniel at zonque.org> wrote:
> I found something that I believe might be an issue, and I have an
> idea on how to correct this, but unfortunately, this doesn't solve
> the issues I occasionally see with this driver. I'd still like to
> share it, because I might be totally mistaken in my understanding.
>
Thanks for sharing you idea. Are you aware of the recent patch I sent
that Loic Poulain
wrote that fixes a race issue in access to wcn36xx_dxe_tx_frame()  ?
Kalle just recently applied it to ath-next, I don't think it's
available yet upstream.
The patch was fixing something similar, perhaps it will solve the
issue you're experiencing.

> With no firmware code or documentation at hand, it's not exactly clear
> which assumption the firmware makes, but obviously, the driver and the
> firmware share memory to exchange descriptors that either contain
> control information or payload. The driver puts control descriptors
> and payload descriptors in a ring buffer in an interleaved fashion.
>
> When the driver wants to send an skb, it looks for a currently unused
> control descriptor, and then fills it, together with its directly
> chained payload descriptor. The descriptors are both marked valid and
> then the firmware is instructed to start processing the ringbuffer.
> In case the firmware is idle when wcn36xx_dxe_tx_frame() is entered,
> this is all fine. However, when sending many packets in a short time
> frame, there are cases when the firmware is still processing the ring
> buffer (iow, ch->reg_ctrl & WCN36xx_DXE_CH_CTRL_EN != 0), and in this
> case, writes to the shared memory area depict a data race. The local
> spinlock of course doesn't help to prevent that. OTOH, it should be
> completely fine to modify the descriptors while firmware is still
> reading them, as the firmware should only pay attention to such that
> are marked valid.
>
> There's a problem with the latter presumption however which looks like
> this in the driver code:
>
>         desc->fr_len = ctl->skb->len;
>         /* set dxe descriptor to VALID */
>         desc->ctrl = ch->ctrl_skb;
>
The firmware won't start reading the descriptors until it receives an
interrupt from driver.
which in turn happen later in the function using: wcn36xx_dxe_write_register()
so I don't think reordering in this case make any problem.

> The CPU may very well reorder the two writes, even though the memory is
> allocated as coherent DMA. When that happens, the firmware may see a
> wrong length for a valid descriptor. A simple memory write barrier would
> suffice to solve this, but then again, there are two descriptors
> involved.
>
> My attempt to fix that restructures the code a bit and makes the
> payload descriptor valid first and then the control descriptor, both
> strongly ordered through memory fences. This however assumes that the
> firmware will ignore valid payload descriptors unless they have a
> valid control descriptor preceding them, but that's really just
> guessing.
>
> Does that make sense? As I said, I can't really say this improves
> anything, sadly, so I might be mistaken entirely. But I'll leave this
> here for further discussion. Ideally, somebody with access to the
> firmware sources could give an assessment whether this is an issue at
> all or not.
When I'm not sure regarding some implementation I consult with the
downstream pronto driver.
https://github.com/sultanqasim/qcom_wlan_prima

Did you look there if they actually placed wmb() ?

>
>
> Thanks,
> Daniel
>
>
> Daniel Mack (1):
>   wcn36xx: fix buffer commit logic on TX path
>
>  drivers/net/wireless/ath/wcn36xx/dxe.c | 75 +++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 37 deletions(-)
>
> --
> 2.14.3
>



More information about the wcn36xx mailing list