[PATCH] ath10k: Replace ioread with wmb for data sync

Peter Oh poh at codeaurora.org
Mon Feb 2 15:49:30 PST 2015


Hi Florian,

Very appreciate your explanation in detail.

Regards,
Peter
On 02/02/2015 03:25 PM, Florian Fainelli wrote:
> On 02/02/15 14:06, Peter Oh wrote:
>> On 02/02/2015 11:47 AM, Johannes Berg wrote:
>>> On Mon, 2015-02-02 at 11:36 -0800, Peter Oh wrote:
>>>> On 02/02/2015 11:22 AM, Johannes Berg wrote:
>>>>>>> You basically have the following sequence:
>>>>>>>
>>>>>>> iowrite()
>>>>>>> ioread()
>>>>>>>
>>>>>>> If you look, you'll see that iowrite() is actually done (or should
>>> be,
>>>>>>> or perhaps with appropriate syncs) on an uncached mapping.
>>>>>> since it's mmio, iowrite will be map to write, not out which is
>>> cached
>>>>>> mapping.
>>>>>> That's why we address "posted write" here.
>>>>>> If it's un-cached mapping which is volatile, we don't even need
>>> ioread.
>>>>> No, this isn't true - "posted write" in the context of this discussion
>>>>> is about the PCIe bus. Memory writes that go through cache aren't
>>>>> referred to as "posted writes", those are just (cached) memory writes.
>>>>>
>>>>>>>      As a result,
>>>>>>> the only thing you care about here is the PCIe bus, not the CPU
>>> cache
>>>>>>> flush. And from there on that's just a question of PCIe bus
>>> semantics.
>>>>>> So how does ioread guarantee PCIe bus transaction done?
>>>>> That's how PCIe works, operations are serialized, and read() has to
>>> wait
>>>>> for a response from the device
>>>> do you know which mechanism or which instruction set makes read() wait
>>>> for a response from the device?
>>> I have no idea. I assume it's just like a DRAM read, the CPU stalls
>>> while there's no response.
>> My explanation in this thread is all about how read() guarantees the
>> wait for a response from the device, therefore why mb() - replace from
>> wmb at patch set 2 - is compatible to read().
>> Briefly speaking,
>> read() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus
>> -> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe
>> device) signals write completion when write transactions completed in
>> write response channel ->  cpu release axi bus -> cpu program counter
>> (pc) proceeds the next to read.
>>
>> the exact same routines happen with mb().
>> mb() -> dsb 'st' -> cpu (actually axi master in cpu) holding axi bus ->
>> cpu post write buffer on axi bus -> axi bus (axi slave which is PCIe
>> device) signals write completion when write transactions completed in
>> write response channel ->  cpu release axi bus -> cpu program counter
>> (pc) proceeds the next to read.
>>
>> Since axi bus master is waiting (blocking) for write completion signal
>> from axi slave (PCIe device), this is how read() and mb() guarantee
>> write command reaches to the device.
> PCIe writes are posted, so the only guarantee you can have by inserting
> such barriers is that writes from CPU to the PCIe RC (targeting PCIe
> device) is non-posted (as far as the busing between CPU and the PCIe RC
> is concerned), but past the PCIe RC, there is no such guarantee, because
> the PCIe specification allows for that and there is flow control, PCIe
> switches or other things that can alter the way your PCIe device ends-up
> being written to.
>
> The only way to make a "portable" synchronization barrier is to do a
> PCIe read from the same register you just wrote to, because then, the
> PCIe RC needs to guarantee the transaction ordering on the PCIe bus itself.
>
> You might just be lucky and/or have very good HW which ensures that the
> ARM synchronization barriers are propagated to the memory region where
> your PCIe device BARs are mapped from the CPU perspective, but you
> definitively cannot rely on such assumptions, as there will be bogus HW
> there, for which only a subsequent ioread32() will work.




More information about the ath10k mailing list