[LSF/MM/BPF BOF] Userspace command aborts

Chaitanya Kulkarni chaitanyak at nvidia.com
Tue Feb 21 10:15:00 PST 2023


On 2/18/2023 1:50 AM, Hannes Reinecke wrote:
> On 2/17/23 19:53, Chaitanya Kulkarni wrote:
>> On 2/16/23 08:40, Keith Busch wrote:
>>> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> it has come up in other threads, so it might be worthwhile to have 
>>>> its own
>>>> topic:
>>>>
>>>> Userspace command aborts
>>>>
>>>> As it stands we cannot abort I/O commands from userspace.
>>>> This is hitting us when running in a virtual machine:
>>>> The VM sets a timeout when submitting a command, but that
>>>> information can't be transmitted to the VM host. The VM host
>>>> then issues a different command (with another timeout), and
>>>> again that timeout can't be transmitted to the attached devices.
>>>> So when the VM detects a timeout, it will try to issue an abort,
>>>> but that goes nowhere as the VM host has no way to abort commands
>>>> from userspace.
>>>> So in the end the VM has to wait for the command to complete, causing
>>>> stalls in the VM if the host had to undergo error recovery or 
>>>> something.
>>>
>>> Aborts are racy. A lot of hardware implements these as a no-op, too.
>>
>> I'd avoid implementing userspace aborts and fix things in spec first.
>>
> What's there to fix in the spec for aborts? You can't avoid the fact 
> that aborts might be sent just at the time when the completion arrives ...
> 

Given that the racy nature I'm am not sure if we can do something in
spec that can allow us to deal with racy scenario(s) to allow userspace
abort.

Also, we do issue abort command from timoeout handler for NVMe PCIe and
I think different combinations of userspace abort, timeout handler
abort, and completion arrival at the time of userspace abort submission
can lead to unclear implementation and more userspace application
confusion.

-ck




More information about the Linux-nvme mailing list