[PATCH v2 1/3] nvme: fix a possible use-after-free in controller reset during load

Max Gurtovoy mgurtovoy at nvidia.com
Thu Feb 3 06:43:09 PST 2022


On 2/1/2022 2:54 PM, Sagi Grimberg wrote:
> Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
> readiness for AER submission. This may lead to a use-after-free
> condition that was observed with nvme-tcp.
>
> The race condition may happen in the following scenario:
> 1. driver executes its reset_ctrl_work
> 2. -> nvme_stop_ctrl - flushes ctrl async_event_work
> 3. ctrl sends AEN which is received by the host, which in turn
>     schedules AEN handling
> 4. teardown admin queue (which releases the queue socket)
> 5. AEN processed, submits another AER, calling the driver to submit
> 6. driver attempts to send the cmd
> ==> use-after-free
>
> In order to fix that, add ctrl state check to validate the ctrl
> is actually able to accept the AER submission.
>
> This addresses the above race in controller resets because the driver
> during teardown should:
> 1. change ctrl state to RESETTING
> 2. flush async_event_work (as well as other async work elements)
>
> So after 1,2, any other AER command will find the
> ctrl state to be RESETTING and bail out without submitting the AER.
>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>   drivers/nvme/host/core.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd18861f77c0..c11cd3a814fd 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4251,6 +4251,8 @@ static void nvme_async_event_work(struct work_struct *work)
>   		container_of(work, struct nvme_ctrl, async_event_work);
>   
>   	nvme_aen_uevent(ctrl);
> +	if (ctrl->state != NVME_CTRL_LIVE)
> +		return;

any reason you moved the queue_ready check in the transport drivers ?

Is it redundant ?


>   	ctrl->ops->submit_async_event(ctrl);
>   }
>   



More information about the Linux-nvme mailing list