[PATCH v5 08/14] iommufd/viommu: Add iommufd_viommu_report_event helper

Nicolin Chen nicolinc at nvidia.com
Mon Jan 13 11:47:52 PST 2025


On Mon, Jan 13, 2025 at 03:21:44PM -0400, Jason Gunthorpe wrote:
> On Sun, Jan 12, 2025 at 09:37:41PM -0800, Nicolin Chen wrote:
> 
> > > > > I supposed we will need a way to indicate lost events to userspace on
> > > > > top of this?
> > > > 
> > > > Perhaps another u32 flag in the arm_smmuv3_vevent struct to report
> > > > an overflow. That said, what userspace/VMM will need to do with it?
> > > 
> > > Trigger the above code in the VM somehow?
> > 
> > I found two ways of forwarding an overflow flag:
> > 
> > 1. Return -EOVERFLOW to read(). But it cannot return the read bytes
> > any more:
> 
> You could not return any bytes, it would have to be 0 bytes read, ie
> immediately return EOVERFLOW and do nothing else.
> 
> Returning EOVERFLOW from read would have to also clear the overflow
> indicator.

OK. That means user space should read again for actual events in the
queue, after getting the first EOVERFLOW.

One concern is, if the report() keeps producing events to the queue,
it will always set the EOVERFLOW flag, then user space won't have a
chance to read the events out until the last report(). Wondering if
this would make sense, as I see SMMU driver's arm_smmu_evtq_thread()
reporting an OVERFLOW while allowing SW to continue reading the evtq.

> The other approach would be to add a sequence number to each event and
> let userspace detect the non-montonicity. It would require adding a
> header to the native ARM evt.

Yea, I thought about that. The tricky thing is that the header will
be a core-level header pairing with a driver-level vEVENTQ type and
can never change its length, though we can define a 64-bit flag that
can reserve the other 63 bits for future use?

That being asked, this seems to be a better approach as user space
can get the overflow flag while doing the read() that can potentially
clear the overflow flag too, v.s. blocked until the last report().

> > 2. Return EPOLLERR via pollfd.revents. But it cannot use POLLERR
> > for other errors any more:
> > --------------------------------------------------
> > @@ -420,2 +421,4 @@ static __poll_t iommufd_eventq_fops_poll(struct file *filep,
> >         poll_wait(filep, &eventq->wait_queue, wait);
> > +       if (test_bit(IOMMUFD_VEVENTQ_ERROR_OVERFLOW, veventq->errors))
> > +               return EPOLLERR;
> >         mutex_lock(&eventq->mutex);
> 
> But then how do you clear the error? I've only seen POLLERR used for
> fatal conditions so there is no recovery, it is permanent.

Overflow means the queue has tons of events for user space to read(),
so user space should read() to clear the error.

I found this piece in eventfd manual, so was wondering if we can resue:
https://man7.org/linux/man-pages/man2/eventfd.2.html
              ?  If an overflow of the counter value was detected, then
                 select(2) indicates the file descriptor as being both
                 readable and writable, and poll(2) returns a POLLERR
                 event.  As noted above, write(2) can never overflow the
                 counter.  However an overflow can occur if 2^64 eventfd
                 "signal posts" were performed by the KAIO subsystem
                 (theoretically possible, but practically unlikely).  If
                 an overflow has occurred, then read(2) will return that
                 maximum uint64_t value (i.e., 0xffffffffffffffff).

Thanks
Nicolin



More information about the linux-arm-kernel mailing list