[RFC PATCH v2 0/3] riscv: support EIC770X/JH7110 noncoherent devices with XPbmtUC

Bo Gan ganboing at gmail.com
Sun Mar 15 23:03:25 PDT 2026


SoCs with pre-Svpbmt Sifive cores, e.g., Starfive JH7110 and ESWIN
EIC770X both have non cache-coherent peripherals. On JH7110[1], video
subsystem (GPU/VOUT/VPU/ISP) is routed to the sys port, making them not
cache-coherent. On EIC770X, all peripherals are routed to the sys port,
and none is cache-coherent. Instead of Svpbmt, these SoCs map system
memory twice -- the conventional cached region (through front port),
and the uncached alias (through sys port) at different base addresses.
The uncached alias implicitly applies the uncacheable PMA. Drivers
working with noncoherent devices can utilize the uncache alias to map
DMA buffers, without doing explicit cache flushes.

This feature is not an ISA standard, and the cache/uncache base can be
configured by the SoC vendor. To expose it properly, introduce a Sifive
"errata", namely "XPbmtUC", to model the setup as a customized version
of Svpbmt. It choses a single, artificial bit in PTE at runtime for
cache/uncache control, effectively offsetting the PPN by power-of-2.
On JH7110, it aligns perfectly with the HW: it maps the cached region
at 0x40000000, and the uncached alias at 0x4_40000000. Chosing bit 32
(PPN bit 34) as the UC bit matches HW exactly.

Starfive JH7110 (Sifive U74 core) memory map:

          [0x0,   0x40000000) Low MMIO
   [0x40000000, 0x2_40000000) Cached Mem
 [0x4_40000000, 0x6_40000000) Uncached Mem (UC+)
 [0x9_00000000, 0x9_d0000000) High MMIO

On EIC770X, the aliased UC region is put to a offset not power-of-2.
There can also be 2 NUMA node (dual-die) with 2 separate memory regions
and their UC alias counterparts are offsetted differently. We detect if
the firmware has the capability to re-arrange the memory map, using
G-stage pagetable, making the the offsets power-of-2 again.

            [0x0,     0x20000000) Core Internal
     [0x20000000,     0x40000000) Core Internal (Die 1)
     [0x40000000,     0x60000000) Low MMIO
     [0x60000000,     0x80000000) Low MMIO (Die 1)
     [0x80000000,  0x10_80000000) Cached Mem
  [0x20_00000000,  0x30_00000000) Cached Mem (Die 1)
  [0x80_00000000,  0xa0_00000000) High MMIO
  [0xa0_00000000,  0xc0_00000000) High MMIO (Die 1)
  [0xc0_00000000,  0xd0_00000000) Uncached Mem <----------.
  [0xe0_00000000,  0xf0_00000000) Uncached Mem (Die 1) <--+--.
with firmware/hypervisor re-mapping:                      |  |
------------------------------------                      |  |
 [0x100_80000000, 0x110_80000000) Mem UC+ ----------------'  |
 [0x120_00000000, 0x130_00000000) Mem UC+ (Die 1) -----------'

The "XPbmtUC" alternative PTE format is the cleanest solution I can
think of to solve the non-coherent device enablement w/o Svpbmt from
kernel side. Drivers can do explicit cache flushes to workaround the
problem, but a. it pushes the burden of cache flushes to driver code,
and we don't want to complicate them if it's already written with the
cache coherent assumption in mind. b. complex drivers like GPU could
allow user-space to mmap DMA pages, but userspace can't flush caches
due to the lack of Zicbom on these SoCs. I'm aware there's an ongoing
series[2] that Samuel sent for physical memory aliases, which is
essentially a superset of my patch. I don't mean to step ahead of him,
but try to find a middle ground if the community still worries about
his change touching too many areas. My change is very minimal and
local. It's fairly easy to remove, too.

----------------------------------------
Notes about PoC firmware implementation on EIC7700X[3]:

The OpenSBI is augmented to provide a very thin layer hypervisor, where
it runs the entire host OS in VS-mode, and provide the aforementioned
remapping. I remap UC+ memory to 2^40+ to make the 2-stage translation
efficient, where I can utilize Sv39x4 G-stage scheme to map the entire
physical address space at bottom-half, and the uncache counterparts to
system memory at top-half. I also make use of the largest page in Sv39
-- 1GB page, to map everything, keeping the G-stage page-table minimal,
only 16KB in size, while also minimizing TLB misses. A very slight,
unavoidable, slow down is with the external interrupt delivery. Due to
the lack of AIA in EIC770X, all device irq now needs to trap to M mode
first, before forwarding to VS mode. The overhead of running KVM in
such setup is yet unknown, and may well be noticeable. All HS-qualified
instructions will trap to M mode, which is costly. The NACL extension,
if implemented, will alleviate it, but there's also the extra cost of
flushing G/VS-stage TLBs. I'm analyzing it in parallel.

Use [4] if you have a Hifive Premier P550 to try it out.

[1] https://github.com/starfive-tech/JH7100_Docs/blob/main/JH7100%20Cache%20Coherence%20V1.0.pdf
[2] https://lore.kernel.org/all/20251113014656.2605447-20-samuel.holland@sifive.com/
[3] https://github.com/ganboing/opensbi/tree/eic77x-vspt-physalias-wip
[4] https://github.com/ganboing/linux-eic77/tree/ganboing-xpbmt-uc-v2-eic77-clk-v15

---
v2:
 - Move the core logic to Sifive errata to address Conor's comments

v1: https://lore.kernel.org/linux-riscv/338f0f79-1eed-4c5c-9966-04a2eaeb3d98@gmail.com

Bo Gan (3):
  riscv: alternatives: support auipc+load pair
  riscv: errata: sifive: support auipc/load pair in patched alternatives
  riscv: errata: sifive: Add an "errata" to simulate Svpbmt on cores
    without

 arch/riscv/Kconfig.errata                    | 13 ++++
 arch/riscv/errata/sifive/errata.c            | 80 +++++++++++++++++++-
 arch/riscv/include/asm/errata_list.h         | 19 ++++-
 arch/riscv/include/asm/errata_list_vendors.h |  3 +-
 arch/riscv/include/asm/insn.h                |  8 ++
 arch/riscv/include/asm/pgtable-64.h          |  9 ++-
 arch/riscv/kernel/alternative.c              | 11 +--
 7 files changed, 132 insertions(+), 11 deletions(-)

-- 
2.34.1




More information about the linux-riscv mailing list