[PATCH] wifi: mt76: mmio_(read|write)_copy byte swap when on Big Endian

Caleb James DeLisle cjd at cjdns.fr
Wed Oct 29 13:40:39 PDT 2025


On 29/10/2025 21:12, Jonas Gorski wrote:
> On Wed, Oct 29, 2025 at 4:24 PM Caleb James DeLisle <cjd at cjdns.fr> wrote:
>>
>> On 29/10/2025 10:15, Jonas Gorski wrote:
>>> On Tue, Oct 28, 2025 at 10:42 PM Caleb James DeLisle <cjd at cjdns.fr> wrote:
>>>> On 28/10/2025 21:19, Jonas Gorski wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, Oct 27, 2025 at 6:19 PM Caleb James DeLisle <cjd at cjdns.fr> wrote:
>>>>>> When on a Big Endian machine, PCI swaps words to/from LE when
>>>>>> reading/writing them. This presents a problem when we're trying
>>>>>> to copy an opaque byte array such as firmware or encryption key.
>>>>>>
>>>>>> Byte-swapping during copy results in two swaps, but solves the
>>>>>> problem.
>>>>>>
>>>>>> Fixes:
>>>>>> mt76x2e 0000:02:00.0: ROM patch build: 20141115060606a
>>>>>> mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
>>>>>> mt76x2e 0000:02:00.0: Build: 1
>>>>>> mt76x2e 0000:02:00.0: Build Time: 201607111443____
>>>>>> mt76x2e 0000:02:00.0: Firmware failed to start
>>>>>> mt76x2e 0000:02:00.0: probe with driver mt76x2e failed with error -145
>>>>>>
>>>>>> Signed-off-by: Caleb James DeLisle <cjd at cjdns.fr>
>>>>>> ---
>>>>>>     drivers/net/wireless/mediatek/mt76/mmio.c | 34 +++++++++++++++++++++++
>>>>>>     1 file changed, 34 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/mediatek/mt76/mmio.c b/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>> index cd2e9737c3bf..776dbaacc8a3 100644
>>>>>> --- a/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>> +++ b/drivers/net/wireless/mediatek/mt76/mmio.c
>>>>>> @@ -30,15 +30,49 @@ static u32 mt76_mmio_rmw(struct mt76_dev *dev, u32 offset, u32 mask, u32 val)
>>>>>>            return val;
>>>>>>     }
>>>>>>
>>>>>> +static void mt76_mmio_write_copy_portable(void __iomem *dst,
>>>>>> +                                         const u8 *src, int len)
>>>>>> +{
>>>>>> +       __le32 val;
>>>>>> +       int i = 0;
>>>>>> +
>>>>>> +       for (i = 0; i < ALIGN(len, 4); i += 4) {
>>>>>> +               memcpy(&val, src + i, sizeof(val));
>>>>>> +               writel(cpu_to_le32(val), dst + i);
>>>>>> +       }
>>>>>> +}
>>>>>> +
>>>>>>     static void mt76_mmio_write_copy(struct mt76_dev *dev, u32 offset,
>>>>>>                                     const void *data, int len)
>>>>>>     {
>>>>>> +       if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
>>>>>> +               mt76_mmio_write_copy_portable(dev->mmio.regs + offset, data,
>>>>>> +                                             len);
>>>>>> +               return;
>>>>>> +       }
>>>>>>            __iowrite32_copy(dev->mmio.regs + offset, data, DIV_ROUND_UP(len, 4));
>>>>> Maybe just replace this with memcpy_toio() which does no swapping at
>>>>> all instead of double swapping on BE?
>>>> I'm not that informed about how PCI works so I had to test to confirm
>>>> my understanding, but I can confirm that memcpy_toio() does not solve
>>>> the problem.
>>> Ah, right, I misread _iowrite32_copy() to do conversion to LE, but it doesn't.
>>>
>>> What architecture is this you have? PowerPC? ARM? MIPS? 32 bit? 64 bit?
>>
>> MIPS32 (EcoNet EN751221 34Kc)
>>
>>
>>> So the differences I see are:
>>>
>>> 1. __iowrite32_copy() uses __raw_writel(), which has different memory
>>> semantics than writel()
>>> 2. __iowrite32_copy() assumed src is aligned to 32 bit, while you
>>> explicitly align it
>>> 3. memcpy_toio() will handle unaligned src properly, but does 64 bit
>>> accesses on 64 bit systems (and uses __raw* again).
>>>
>>> Is src aligned? If not, then the issue might be 2. And if your system
>>> is 64 bit, it would explain why 3 didn't help.
>>
>> I'm not a regular developer of mt76 so I wasn't sure if that was
>> guaranteed and I just wanted to code for safety.
>>
>> After reviewing the code, I see that there are a few places where
>> mt76_mmio_write_copy is being called with stack-allocated u8 arrays
>> so it's pretty clear to me that this is being treated as a memcpy-like
>> function and we should be handling unaligned inputs.
>>
>>
>>> As a first step you could try to replace the writel(cpu_to_le32(val)
>>> with a iowrite32be(val, ...) which should do the same except avoiding
>>> the doubled byte swapping. If that works, you can try to replace it
>> This works.
>>
>> These symbols are a bit of a nightmare to trace, so I ended up making
>> an .i file so I could confirm what's happening.
>>
>> iowrite32be() uses the version in iomap.c so I understand that's using
>> writel(swab32(val),port), so a writel with an unconditional byte swap.
>> writel() is more complicated, it's an inline function that is generated
>> in a rat's nest of preprocessor macros in mips/include/asm/io.h
>>
>> The preprocessed is this:
>>
>> __mem = (void *)((unsigned long)(mem)); __val = (val); if (sizeof(u32)
>> != sizeof(u64) || sizeof(u64) == sizeof(long)) { *__mem = __val;
>>
>> The source is this:
>>
>>       __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
>>       __val = pfx##ioswab##bwlq(__mem, val);                \
>>       if (sizeof(type) != sizeof(u64) || sizeof(u64) == sizeof(long)) \
>>           *__mem = __val;                        \
>>
>> The line "pfx##ioswab##bwlq(__mem, val);" is ioswabl() and the source
>> of that explains the issue:
>>
>>    * Sane hardware offers swapping of PCI/ISA I/O space accesses in hardware;
>>    * less sane hardware forces software to fiddle with this...
>>
>> So this confirms my initial understanding, the PCI hardware is doing the
>> byte swapping unconditionally.
>>
>>
>>> with __raw_writel(), which then would make this the same as
>>> __iowrite32_copy, except making sure that src is aligned.
>>
>> This fails.
>>
>> Since I'm the maintainer of this SoC and it's still fairly new, I wrote
>> a trivial kmod to verify that unaligned access is not just silently
>> returning trash, it works as though it were aligned so alignment is
>> not the issue.
>>
>>
>>> Also you could replace your memcpy() with get_unaligned((u32 *)(src +
>>> i)); Should do the same but inline.
>> Good idea, I will do this.
>>>> The issue as I understand it is that rather than making every driver
>>>> carefully call cpu_to_le*() every MMIO write, someone decided to make
>>>> the PCI host bridge itself transparently byte-swap all MMIO on the
>>>> wire. Since most MMIO is hitting registers and most buffers are
>>>> transferred by DMA, for the most part everything works and nobody
>>>> notices.
>>>>
>>>> But in the rare case that we need to write a blob to MMIO, it gets
>>>> transparently swapped in hardware so you need to use cpu_to_le in that
>>>> case. Doing a search of ./drivers for write.*cpu_to_le I can see this
>>>> issue comes up a bit.
>>> Every (PCI) driver does conversion to LE implicitly by using
>>> writel/readl (or iowrite32/ioread32) etc do the conversion to/from LE.
>>> So writel(foo, dst )is a __raw_writel(cpu_to_le32(foo), dst) etc. PCI
>>> memory is assumed to be in LE. If you are on a little endian system,
>>> then no byte swapping happens, and on BE it will do byte swapping
>>> before writing the value.
>> Okay so it seems that in the case of MIPS, that's not always how it
>> works.
>>
>> https://github.com/torvalds/linux/blob/e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6/arch/mips/include/asm/mach-generic/mangle-port.h#L17
>>
>> Since we don't know if the swap will happen in hardware or software
>> and (AFAIK) this is not a hot enough path that double-swapping will
>> have a notable performance penalty, I think the most sane thing to
>> do is use writel(cpu_to_le32()) and not care if it's swapped back
>> in the kernel or hardware.
> Oh, I think I see what it happening here. ECONET is a Big Endian MIPS
> platform, but does not select SWAP_IO_SPACE (most other BE platforms
> do).
>
> Does that mean the PCI space is swapped in hardware there?
>
> I guess that means that anything that uses __raw accessors to PCI
> space directly or indirectly is broken, as the raw data is now
> actually in the wrong order and needs to be swab'd.
>
> I don't know if it is a good idea to change this in __iowrite32_copy()
> / __ioread32_copy() (and other helpers), or if there are drivers that
> use it on non-PCI spaces and would be broken by that.
>
> If there is a way, I would suggest disabling hardware conversion and
> selecting SWAP_IO_SPACE, but that will affect a lot of your code that
> assumes that writel() etc don't convert to/from little endian.


I can look around in the hardware registers and see if I can shut it
off for EcoNet, but if you're saying MT76 should not support BE unless
they disable hardware swapping and use SWAP_IO_SPACE, that means the
majority of BE hardware on OpenWrt is not going to be supported. If
that's the decision then it at least warrants clear documentation.

Thanks,
Caleb


user at cjd-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while read 
x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 'SWAP_IO_SPACE=y' 
"$x" || echo "$x
  does not use SWAP_IO_SPACE" ) ; done
./target/linux/apm821xx/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl931x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl930x_nand/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl839x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl931x_nand/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl930x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/realtek/rtl838x/config-6.12 does not use SWAP_IO_SPACE
./target/linux/ath79/config-6.12 does not use SWAP_IO_SPACE
./target/linux/octeon/config-6.12 does not use SWAP_IO_SPACE
./target/linux/ixp4xx/config-6.12 does not use SWAP_IO_SPACE
./target/linux/econet/en751221/config-6.12 does not use SWAP_IO_SPACE
user at cjd-dev:~/en7526/openwrt$ find ./ -name 'config-6.12' | while read 
x; do grep -q 'CPU_BIG_ENDIAN=y' "$x" && ( grep -q 'SWAP_IO_SPACE=y' 
"$x" && echo "$x
  does use SWAP_IO_SPACE" ) ; done
./target/linux/bmips/bcm6358/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6328/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6318/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6368/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm6362/config-6.12 does use SWAP_IO_SPACE
./target/linux/bmips/bcm63268/config-6.12 does use SWAP_IO_SPACE
./target/linux/lantiq/config-6.12 does use SWAP_IO_SPACE
user at cjd-dev:~/en7526/openwrt$


>
> Best regards,
> Jonas
>



More information about the Linux-mediatek mailing list