[PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation

Wed Jun 4 10:03:28 PDT 2025

On 2025-06-04 5:18 pm, Daniel Stone wrote:
> Hi Tomeu,
> I have some bad news ...
> 
> On Wed, 4 Jun 2025 at 08:57, Tomeu Vizoso <tomeu at tomeuvizoso.net> wrote:
>> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
>> +{
>> +       [...]
>> +
>> +       /* This will map the pages to the IOMMU linked to core 0 */
>> +       sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
>> +       if (IS_ERR(sgt)) {
>> +               ret = PTR_ERR(sgt);
>> +               goto err;
>> +       }
>> +
>> +       /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
> 
> So, uh, this is not great.
> 
> We only have a single IOMMU context (well, one per core, but one
> effective VMA) for the whole device. Every BO that gets created, gets
> mapped into the IOMMU until it's been destroyed. Given that there is
> no client isolation and no CS validation, that means that every client
> has RW access to every BO created by any other client, for the
> lifetime of that BO.
> 
> I really don't think that this is tractable, given that anyone with
> access to the device can exfiltrate anything that anyone else has
> provided to the device.
> 
> I also don't think that CS validation is tractable tbh.
> 
> So I guess that leaves us with the third option: enforcing context
> separation within the kernel driver.
> 
> The least preferable option I can think of is that rkt sets up and
> tears down MMU mappings for each job, according to the BO list
> provided for it. This seems like way too much overhead - especially
> with RK IOMMU ops having been slow enough within DRM that we expended
> a lot of effort in Weston doing caching of DRM BOs to avoid doing this
> unless completely necessary. It also seems risky wrt allocating memory
> in drm_sched paths to ensure forward progress.
> 
> Slightly more preferable than this would be that rkt kept a
> per-context list of BOs and their VA mappings, and when switching
> between different contexts, would tear down all MMU mappings from the
> old context and set up mappings from the new. But this has the same
> issues with drm_sched.
> 
> The most preferable option from where I sit is that we could have an
> explicit notion of driver-managed IOMMU contexts, such that rkt could
> prepare the IOMMU for each context, and then switching contexts at
> job-run time would be a matter of changing the root DTE pointer and
> issuing a flush. But I don't see that anywhere in the user-facing
> IOMMU API, and I'm sure Robin (CCed) will be along shortly to explain
> why it's not possible ...

On the contrary, it's called iommu_attach_group() :)

In fact this is precisely the usage model I would suggest for this sort 
of thing, and IIRC I had a similar conversation with the Ethos driver 
folks a few years back. Running your own IOMMU domain is no biggie, see 
several other DRM drivers (including rockchip). As long as you have a 
separate struct device per NPU core then indeed it should be perfectly 
straightforward to maintain distinct IOMMU domains per job, and 
attach/detach them as part of scheduling the jobs on and off the cores. 
Looks like rockchip-iommu supports cross-instance attach, so if 
necessary you should already be OK to have multiple cores working on the 
same job at once, without needing extra work at the IOMMU end.

> Either way, I wonder if we have fully per-context mappings, userspace
> should not manage IOVAs in the VM_BIND style common to newer drivers,
> rather than relying on the kernel to do VA allocation and inform
> userspace of them?

Indeed if you're using the IOMMU API directly then you need to do your 
own address space management either way, so matching bits of process VA 
space is pretty simple to put on the table.

Thanks,
Robin.

> 
> I'm really sorry this has come so late in the game.
> 
> Cheers,
> Daniel