[PATCH v1] iommu/riscv: Support 32-bit register accesses

Guo Ren guoren at kernel.org
Thu Jun 18 09:40:47 PDT 2026


On Thu, Jun 18, 2026 at 9:36 PM David Laight
<david.laight.linux at gmail.com> wrote:
>
> On Thu, 18 Jun 2026 17:51:34 +0800
> Guo Ren <guoren at kernel.org> wrote:
>
> > Hi Vivian,
> >
> > As noted in the RISC-V IOMMU Specification, Chapter 6:
> > > Whether an 8-byte access to an IOMMU register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMMU, as if two separate 4-byte accesses — first to the high half and second to the low half — were performed.
> >
> > Therefore, the atomicity of 64-bit MMIO accesses is UNSPECIFIED and
> > not clearly defined in the current ratified RISC-V IOMMU
> > specification. To handle this correctly, the Linux RISC-V IOMMU driver
> > should fall back to 32-bit MMIO accesses when reading 64-bit registers
> > (e.g., performance counters). The behavior of 32-bit MMIO accesses is
> > more precisely defined in the RISC-V IOMMU specification.
> >
> > Thus, many hardware vendors implement 32-bit MMIO (rather than 64-bit
> > MMIO) based on the current ratified RISC-V IOMMU specification, and
> > this driver does not appear to benefit from 64-bit MMIO access either.
> > Performance is fundamentally constrained by bus latency; assuming that
> > simply reducing the number of accesses will improve performance is an
> > oversimplification that ignores the underlying hardware
> > characteristics.
>
> If the bus latency is significant it is almost certainly worth using
> memory accesses to avoid re-reading the hi register.
>
> Something like this might work:
>
> static volatile u32 hi_prev, lo_prev;
>
>         u32 hi = read_reg_hi();
>         u32 lo = read_reg_lo();
>
>         if (lo <= lo_prev || hi != hi_prev) {
>                 u32 hi_tmp = read_reg_hi;
>                 if (hi_tmp != hi) {
>                         hi = hi_tmp;
>                         lo = 0;
>                 }
>                 lo_prev = ~0u;
>                 hi_prev = hi;
>         }
>         lo_prev = lo;
>         return (u64)hi << 32 | lo;
>
> It shouldn't need any locking but the accesses do need to be ordered.
Thank you for the suggestion. However, I believe this feedback is more
relevant to the RISC-V IOMMU HPM patchset [1], as no counter registers
are involved in the current patchset. That said, the idea of improving
the hi-lo-hi slow-path mechanism to better handle high-latency
hardware scenarios is well taken and worth discussing in the
appropriate thread.
[1]: https://lore.kernel.org/linux-riscv/20260208063848.3547817-2-zong.li@sifive.com/

P.S. The hardware I have at hand exhibits very low interconnect
latency. And I have never observed the slow path where hi_tmp != hi
being triggered — my approach was to remove the retry mechanism
directly in 32-bit mmio mode and run stress tests to check whether
perf stat produced incorrect results. That said, I may have simply
been lucky instead of hw guarantee.

-- 
Best Regards
 Guo Ren



More information about the linux-riscv mailing list