[RFC PATCH v3 0/9] accel: rocket: Add RK3568 NPU support
Midgy Balon
midgy971 at gmail.com
Mon Jun 8 01:05:33 PDT 2026
Hello Chaoyi,
Thanks -- this is exactly what I needed.
- v2/DTE: will do. I'll keep building on Simon's per-device-ops series -- with
that in place the NPU MMU can use the 32-bit-DTE ops (the per-ops GFP_DMA32
that's already in mainline) without the global rk_ops conflict. I'll
keep it as
a stated dependency of the v4 cover letter.
- vdd_npu: I'll switch the RK3568 NPU
power domain to need_regulator + domain-supply = <&vdd_npu> and drop the
regulator-always-on workaround. I suspect that's also the right fix for the
power-off/on de-idle issue I described -- the always-on was really
just papering
over the domain not being modelled with a regulator. I'll confirm on
the board.
- AUTO_GATING: thanks for the commit references -- I'll keep the bit-31
read-modify-write form with your Suggested-by and write the comment
from those.
For the record: on v7.1-rc6 the NPU MMU also completes translations
on the reset
value (I couldn't reproduce a page-walk stall without the write), so I'll note
in the commit that it matches the vendor clock-gating handling rather than
fixing a failure I can reproduce here -- happy to drop it if the iommu
maintainers would prefer.
- PVTPLL/NoC: I'll follow up with Finley. First I'll check whether the
need_regulator change resolves the NoC re-power de-idle on its own;
if it still
I'll bring him the details (the genpd power-on de-idle ack and the
BUS_IDLE_ST state).
I'll send a v4 with these. Thanks again for the quick, detailed answers.
Kind regards,
Midgy
Le lun. 8 juin 2026 à 03:40, Chaoyi Chen <chaoyi.chen at rock-chips.com> a écrit :
>
> Hi Midgy,
>
> On 6/8/2026 5:03 AM, Midgy Balon wrote:
> > Hi Chaoyi,
> >
> > Thanks a lot for looking at this -- input from Rockchip is exactly what this
> > series needs.
> >
> >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
> >> implying it should support 40-bit PAs. Nevertheless, please note that the
> >> upper limit for DTE is 32 bits.
> >
> > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let
> > me lay out what I see and ask how you'd prefer to solve it.
> >
> > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
> > The problem on this 8 GiB board: with the v2 ops the page-table allocations
> > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
> > NPU's first translation faults with DMA_READ_ERROR. To work around that I had
> > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
> > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the
> > driver keeps a single global rk_ops, a v1 NPU MMU then trips
> > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why
> > I based the series on Simon's per-device-ops work.
> >
> > So my question: with per-device ops in place, what's the intended way to keep
> > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
> > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
> > config bit that constrains the DTE address? I'd rather follow the Rockchip
> > intent here than carry the v1 workaround. (Simon, cc'd -- this is right next to
> > your per-device-ops series.)
> >
>
> If Simon's method works, please use it :)
>
> >> Can these operations not be completed via the pmdomain driver?
> >> If some operations are controlled by TF-A, are you using open source TF-A?
> >
> > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
> > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the
> > PMU directly. Two things remain outside it:
> >
> > - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
> > domain's domain-supply, because as a domain-supply it created a device-link
> > to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
> > reading the NPU QoS registers behind the (gated) NoC. If there's a clean way
> > to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer
> > that -- pointers welcome.
> >
>
> Please refer to the patch below regarding the RK3588 NPU pmdomain.
> In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain.
>
> https://lore.kernel.org/all/20251216055247.13150-1-rmxpzlb@gmail.com/
>
> > - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
> > needed for actual compute, not for bring-up.
> >
> > One more pmdomain observation from testing, possibly relevant to how the NPU
> > domain should be modelled: the domain's power-off/on cycle doesn't reliably
> > re-de-idle the NoC. If the NPU is probed after genpd has already powered the
> > (unused) domain off, the power-on de-idle fails ("failed to set idle on domain
> > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO access.
> > Probing the NPU before the unused-domain power-off, or marking the domain
> > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
> > re-power here, or should this domain effectively stay on?
> >
>
> Not quite sure what's going on with PVTPLL and NOC.
> Maybe @Finley knows about this?
>
> > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
> > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing PSCI
> > and the SCMI clock service. The only closed blob in the boot chain is Rockchip's
> > DDR init (rkbin), which is the standard situation for mainline RK356x.
>
> --
> Best,
> Chaoyi
More information about the linux-arm-kernel
mailing list