[RFC] qla2xxx: Add dev_loss_tmo kernel module options
Benjamin Block
bblock at linux.ibm.com
Tue Apr 20 18:27:00 BST 2021
On Mon, Apr 19, 2021 at 12:00:14PM +0200, Daniel Wagner wrote:
> Allow to set the default dev_loss_tmo value as kernel module option.
>
> Cc: Nilesh Javali <njavali at marvell.com>
> Cc: Arun Easi <aeasi at marvell.com>
> Signed-off-by: Daniel Wagner <dwagner at suse.de>
> ---
> Hi,
>
> During array upgrade tests with NVMe/FC on systems equiped with QLogic
> HBAs we faced the problem with the default setting of dev_loss_tmo.
>
> When the default timeout hit after 60 seconds the file system went
> into read only mode. The fix was to set the dev_loss_tmo to infinity
> (note this patch can't handle this).
>
> For lpfc devices we could use the sysfs interface under
> fc_remote_ports which exposed the dev_loss_tmo for SCSI and NVMe
> rports.
>
> The QLogic only expose the rports via fc_remote_ports if SCSI is used.
> There is the debugfs interface to set the dev_loss_tmo but this has
> two issues. First, it's not watched by udevd hence no rules work. This
> could be somehow worked around by setting it statically, but that is
> really only an option for testing. Even if the debugfs interface is
> used there is a bug in the code. In qla_nvme_register_remote() the
> value 0 is assigned to dev_loss_tmo and the NVMe core will use it's
> default value 60 (this code path is exercised if the rport droppes
> twice).
>
> Anyway, this patch is just to get the discussion going. Maybe the
> driver could implement the fc_remote_port interface? Hannes was
> pointing out it might make sense to think about an controller sysfs
> API as there is already a host and the NVMe protocol is all about host
> and controller.
>
> Thanks,
> Daniel
>
> drivers/scsi/qla2xxx/qla_attr.c | 4 ++--
> drivers/scsi/qla2xxx/qla_gbl.h | 1 +
> drivers/scsi/qla2xxx/qla_nvme.c | 2 +-
> drivers/scsi/qla2xxx/qla_os.c | 5 +++++
> 4 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
> index 3aa9869f6fae..0d2386ba65c0 100644
> --- a/drivers/scsi/qla2xxx/qla_attr.c
> +++ b/drivers/scsi/qla2xxx/qla_attr.c
> @@ -3036,7 +3036,7 @@ qla24xx_vport_create(struct fc_vport *fc_vport, bool disable)
> }
>
> /* initialize attributes */
> - fc_host_dev_loss_tmo(vha->host) = ha->port_down_retry_count;
> + fc_host_dev_loss_tmo(vha->host) = ql2xdev_loss_tmo;
> fc_host_node_name(vha->host) = wwn_to_u64(vha->node_name);
> fc_host_port_name(vha->host) = wwn_to_u64(vha->port_name);
> fc_host_supported_classes(vha->host) =
> @@ -3260,7 +3260,7 @@ qla2x00_init_host_attr(scsi_qla_host_t *vha)
> struct qla_hw_data *ha = vha->hw;
> u32 speeds = FC_PORTSPEED_UNKNOWN;
>
> - fc_host_dev_loss_tmo(vha->host) = ha->port_down_retry_count;
> + fc_host_dev_loss_tmo(vha->host) = ql2xdev_loss_tmo;
> fc_host_node_name(vha->host) = wwn_to_u64(vha->node_name);
> fc_host_port_name(vha->host) = wwn_to_u64(vha->port_name);
> fc_host_supported_classes(vha->host) = ha->base_qpair->enable_class_2 ?
> diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
> index fae5cae6f0a8..0b9c24475711 100644
> --- a/drivers/scsi/qla2xxx/qla_gbl.h
> +++ b/drivers/scsi/qla2xxx/qla_gbl.h
> @@ -178,6 +178,7 @@ extern int ql2xdifbundlinginternalbuffers;
> extern int ql2xfulldump_on_mpifail;
> extern int ql2xenforce_iocb_limit;
> extern int ql2xabts_wait_nvme;
> +extern int ql2xdev_loss_tmo;
>
> extern int qla2x00_loop_reset(scsi_qla_host_t *);
> extern void qla2x00_abort_all_cmds(scsi_qla_host_t *, int);
> diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
> index 0cacb667a88b..cdc5b5075407 100644
> --- a/drivers/scsi/qla2xxx/qla_nvme.c
> +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> @@ -41,7 +41,7 @@ int qla_nvme_register_remote(struct scsi_qla_host *vha, struct fc_port *fcport)
> req.port_name = wwn_to_u64(fcport->port_name);
> req.node_name = wwn_to_u64(fcport->node_name);
> req.port_role = 0;
> - req.dev_loss_tmo = 0;
> + req.dev_loss_tmo = ql2xdev_loss_tmo;
>
> if (fcport->nvme_prli_service_param & NVME_PRLI_SP_INITIATOR)
> req.port_role = FC_PORT_ROLE_NVME_INITIATOR;
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index d74c32f84ef5..c686522ff64e 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -338,6 +338,11 @@ static void qla2x00_free_device(scsi_qla_host_t *);
> static int qla2xxx_map_queues(struct Scsi_Host *shost);
> static void qla2x00_destroy_deferred_work(struct qla_hw_data *);
>
> +int ql2xdev_loss_tmo = 60;
> +module_param(ql2xdev_loss_tmo, int, 0444);
> +MODULE_PARM_DESC(ql2xdev_loss_tmo,
> + "Time to wait for device to recover before reporting\n"
> + "an error. Default is 60 seconds\n");
Wouldn't that be really really confusing, if you set essentially the
same thing with two different knobs for one FC HBA? We already have
a `dev_loss_tmo` kernel parameter - granted, only for scsi_transport_fc;
but doesn't qla implement that as well?
I don't really have any horses in this race here, but that sounds
strange.
--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen / Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
More information about the Linux-nvme
mailing list