[PATCH v2 2/2] nvme: handle persistent internal error AER from NVMe controller

Fri Jun 3 12:23:07 PDT 2022

On Fri, Jun 03, 2022 at 10:56:01AM -0700, Michael Kelley wrote:

This series looks good to me. Just one concern below that may amount to
nothing.

> +static void nvme_handle_aer_persistent_error(struct nvme_ctrl *ctrl)
> +{
> +	u32 csts;
> +
> +	trace_nvme_async_event(ctrl, NVME_AER_ERROR);
> +
> +	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &csts) != 0 ||

The reg_read32() is non-blocking for pcie, so this is safe to call from that
driver's irq handler. The other transports block on register reads, though, so
they can't call this from an atomic context. The TCP context looks safe, but
I'm not sure about RDMA or FC.

> +	    nvme_should_reset(ctrl, csts)) {
> +		dev_warn(ctrl->device, "resetting controller due to AER\n");
> +		nvme_reset_ctrl(ctrl);
> +	}
> +}
> +
>  void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
>  		volatile union nvme_result *res)
>  {
>  	u32 result = le32_to_cpu(res->u32);
>  	u32 aer_type = result & 0x07;
> +	u32 aer_subtype = (result & 0xff00) >> 8;

Since the above mask + shift is duplicated with nvme_handle_aen_notice(), an
inline helper function seems reasonable.