[PATCH 04/10] qla2xxx: Fix crash in NVME abort path

Himanshu Madhani himanshu.madhani at oracle.com
Wed Sep 8 06:58:08 PDT 2021



> On Sep 8, 2021, at 2:28 AM, Nilesh Javali <njavali at marvell.com> wrote:
> 
> From: Arun Easi <aeasi at marvell.com>
> 
> System crash was seen when I/O was run against a NVME target and when I/O
> aborts were occurring.
> 
> Crash stack is:
> 
>    -- relevant crash stack --
>    BUG: kernel NULL pointer dereference, address: 0000000000000010
>    :
>    #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e
>       [exception RIP: qla_nvme_abort_work+339]
>       RIP: ffffffffc0f592e3  RSP: ffffae1f8666be80  RFLAGS: 00010297
>       RAX: 0000000000000000  RBX: ffff9b581fc8af80  RCX: ffffffffc0f83bd0
>       RDX: 0000000000000001  RSI: ffff9b5839c6c7c8  RDI: 0000000008000000
>       RBP: ffff9b6832f85000   R8: ffffffffc0f68160   R9: ffffffffc0f70652
>       R10: ffffae1f862ffdc8  R11: 0000000000000300  R12: 000000000000010d
>       R13: 0000000000000000  R14: ffff9b5839cea000  R15: 0ffff9b583fab170
>       ORIG_RAX: ffffffffffffffff   CS: 0010  SS: 0018
>    #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184
>    #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d
>    #9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed
> 
> The crash was due to a stale SRB structure access after it was aborted.
> Fixed the issue by removing stale access.
> 

Add following 

Fixes: 2cabf10dbbe38 (“scsi: qla2xxx: Fix hang on NVMe command timeouts ”)
Cc: stable at vger.kernel.org

> Signed-off-by: Arun Easi <aeasi at marvell.com>
> Signed-off-by: Nilesh Javali <njavali at marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_nvme.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
> index 1c5da2dbd6f9..877b2b625020 100644
> --- a/drivers/scsi/qla2xxx/qla_nvme.c
> +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> @@ -228,6 +228,8 @@ static void qla_nvme_abort_work(struct work_struct *work)
> 	fc_port_t *fcport = sp->fcport;
> 	struct qla_hw_data *ha = fcport->vha->hw;
> 	int rval, abts_done_called = 1;
> +	bool io_wait_for_abort_done;
> +	uint32_t handle;
> 
> 	ql_dbg(ql_dbg_io, fcport->vha, 0xffff,
> 	       "%s called for sp=%p, hndl=%x on fcport=%p desc=%p deleted=%d\n",
> @@ -244,12 +246,20 @@ static void qla_nvme_abort_work(struct work_struct *work)
> 		goto out;
> 	}
> 
> +	/*
> +	 * sp may not be valid after abort_command if return code is either
> +	 * SUCCESS or ERR_FROM_FW codes, so cache the value here.
> +	 */
> +	io_wait_for_abort_done = ql2xabts_wait_nvme &&
> +					QLA_ABTS_WAIT_ENABLED(sp);
> +	handle = sp->handle;
> +
> 	rval = ha->isp_ops->abort_command(sp);
> 
> 	ql_dbg(ql_dbg_io, fcport->vha, 0x212b,
> 	    "%s: %s command for sp=%p, handle=%x on fcport=%p rval=%x\n",
> 	    __func__, (rval != QLA_SUCCESS) ? "Failed to abort" : "Aborted",
> -	    sp, sp->handle, fcport, rval);
> +	    sp, handle, fcport, rval);
> 
> 	/*
> 	 * If async tmf is enabled, the abort callback is called only on
> @@ -264,7 +274,7 @@ static void qla_nvme_abort_work(struct work_struct *work)
> 	 * are waited until ABTS complete. This kref is decreased
> 	 * at qla24xx_abort_sp_done function.
> 	 */
> -	if (abts_done_called && ql2xabts_wait_nvme && QLA_ABTS_WAIT_ENABLED(sp))
> +	if (abts_done_called && io_wait_for_abort_done)
> 		return;
> out:
> 	/* kref_get was done before work was schedule. */
> -- 
> 2.19.0.rc0
> 

Otherwise

Reviewed-by: Himanshu Madhani <himanshu.madhani at oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering



More information about the Linux-nvme mailing list