nvme-fabrics: devices are uninterruptable
Belanger, Martin
Martin.Belanger at dell.com
Tue Jan 17 07:28:51 PST 2023
> On Wed, Jan 11, 2023 at 02:37:58PM +0000, Belanger, Martin wrote:
> > POSIX.1 specifies that certain functions such as read() or write() can act as
> cancellation points.
>
> device special files are mostly out of scope for the normal Posix rules..
>
> > I wanted to ask the community if there is a reason for the nvme driver to not
> support POSIX cancellation points? I also wanted to know whether it would be
> possible to add support for it? Is there a downside to doing so?
>
> How do you propose to allow for safe interruption?
Hi Christoph,
Connection requests that are pending because the kernel is currently busy working on another connection request should be cancellable. I agree that once the kernel starts processing a connection request, then that connection request can no longer be cancelled. It would be too complex to cleanly interrupt a connection request midflight.
On the other hand, if the kernel allowed all the connection requests to be processed concurrently such that no connection request gets delayed by another one, then there would be no need for cancellation support.
This is only a problem when large numbers of connection requests are made at the same time while there is connectivity issues. That's because a failing connection blocks the /dev/nvme-fabrics interface for about 3 seconds. For large numbers of failing connections the interface can block for long periods of time (it only takes 20 failing connections to make the interface busy for a whole minute). Allowing multiple connection requests in parallel would reduce the amount of blocking since all the failing connection requests would not get in the way of the successful ones. It also means that all the failing connection requests would be reported as failed more or less at the same after 3 seconds instead of being reported one at a time once every 3 seconds.
Regards,
Martin
More information about the Linux-nvme
mailing list