[PATCH RFC 1/5] nvme: Let the blocklayer set timeouts for requests

Wed Feb 18 04:19:06 PST 2026

On Tue, Feb 17, 2026 at 12:15:12PM -0800, Mohamed Khalfella wrote:
> On Thu 2026-02-12 13:09:47 +0100, Maurizio Lombardi wrote:
> > From: "Heyne, Maximilian" <mheyne at amazon.de>
> > 
> > When initializing an nvme request which is about to be send to the block
> > layer, we do not need to initialize its timeout. If it's left
> > uninitialized at 0 the block layer will use the request queue's timeout
> > in blk_add_timer (via nvme_start_request which is called from
> > nvme_*_queue_rq). These timeouts are setup to either NVME_IO_TIMEOUT or
> > NVME_ADMIN_TIMEOUT when the request queues were created.
> > 
> > Because the io_timeout of the IO queues can be modified via sysfs, the
> > following situation can occur:
> > 
> > 1) NVME_IO_TIMEOUT = 30 (default module parameter)
> > 2) nvme1n1 is probed. IO queues default timeout is 30 s
> > 3) manually change the IO timeout to 90 s
> >    echo 90000 > /sys/class/nvme/nvme1/nvme1n1/queue/io_timeout
> > 4) Any call of __submit_sync_cmd on nvme1n1 to an IO queue will issue
> >    commands with the 30 s timeout instead of the wanted 90 s which might
> >    be more suitable for this device.
> > 
> > Commit 470e900c8036 ("nvme: refactor nvme_alloc_request") silently
> > changed the behavior for ioctl's already because it unconditionally
> > overrides the request's timeout that was set in nvme_init_request. If it
> > was unset by the user of the ioctl if will be overridden with 0 meaning
> > the block layer will pick the request queue's IO timeout.
> > 
> > Following up on that, this patch further improves the consistency of IO
> > timeout usage. However, there are still uses of NVME_IO_TIMEOUT which
> > could be inconsistent with what is set in the device's request_queue by
> > the user.
> > 
> > Signed-off-by: Maximilian Heyne <mheyne at amazon.de>
> > ---
> >  drivers/nvme/host/core.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 7bf228df6001..b9315f0abf80 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -724,10 +724,8 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
> >  		struct nvme_ns *ns = req->q->disk->private_data;
> >  
> >  		logging_enabled = ns->head->passthru_err_log_enabled;
> > -		req->timeout = NVME_IO_TIMEOUT;
> >  	} else { /* no queuedata implies admin queue */
> >  		logging_enabled = nr->ctrl->passthru_err_log_enabled;
> > -		req->timeout = NVME_ADMIN_TIMEOUT;
> >  	}
> >  
> >  	if (!logging_enabled)
> > -- 
> > 2.53.0
> > 
> > 
> 
> I wonder what was the impact of these lines given that req->timeout is
> set to 0 by blk_mq_rq_ctx_init() when the request is allocated?
> 

Looking through the nvme code. blk_mq_rq_ctx_init is always called
(indirectly, e.g. via blk_mq_alloc_request) before nvme_init_request and
therefore the timeouts are set to NVME_IO_TIMEOUT/NVME_ADMIN_TIMEOUT.
Now with these lines removed the timeout is 0 which means the
request_queue's rq_timeout will be used, see blk_add_timer.

Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597