[PATCH v7 0/6] nvme-fc: FPIN link integrity handling
Bryan Gurney
bgurney at redhat.com
Tue Jul 1 13:32:26 PDT 2025
On Tue, Jun 24, 2025 at 4:20 PM Bryan Gurney <bgurney at redhat.com> wrote:
>
> FPIN LI (link integrity) messages are received when the attached
> fabric detects hardware errors. In response to these messages I/O
> should be directed away from the affected ports, and only used
> if the 'optimized' paths are unavailable.
> Upon port reset the paths should be put back in service as the
> affected hardware might have been replaced.
> This patch adds a new controller flag 'NVME_CTRL_MARGINAL'
> which will be checked during multipath path selection, causing the
> path to be skipped when checking for 'optimized' paths. If no
> optimized paths are available the 'marginal' paths are considered
> for path selection alongside the 'non-optimized' paths.
> It also introduces a new nvme-fc callback 'nvme_fc_fpin_rcv()' to
> evaluate the FPIN LI TLV payload and set the 'marginal' state on
> all affected rports.
>
> The testing for this patch set was performed by Bryan Gurney, using the
> process outlined by John Meneghini's presentation at LSFMM 2024, where
> the fibre channel switch sends an FPIN notification on a specific switch
> port, and the following is checked on the initiator:
>
> 1. The controllers corresponding to the paths on the port that has
> received the notification are showing a set NVME_CTRL_MARGINAL flag.
>
> \
> +- nvme4 fc traddr=c,host_traddr=e live optimized
> +- nvme5 fc traddr=8,host_traddr=e live non-optimized
> +- nvme8 fc traddr=e,host_traddr=f marginal optimized
> +- nvme9 fc traddr=a,host_traddr=f marginal non-optimized
>
> 2. The I/O statistics of the test namespace show no I/O activity on the
> controllers with NVME_CTRL_MARGINAL set.
>
> Device tps MB_read/s MB_wrtn/s MB_dscd/s
> nvme4c4n1 0.00 0.00 0.00 0.00
> nvme4c5n1 25001.00 0.00 97.66 0.00
> nvme4c9n1 25000.00 0.00 97.66 0.00
> nvme4n1 50011.00 0.00 195.36 0.00
>
>
> Device tps MB_read/s MB_wrtn/s MB_dscd/s
> nvme4c4n1 0.00 0.00 0.00 0.00
> nvme4c5n1 48360.00 0.00 188.91 0.00
> nvme4c9n1 1642.00 0.00 6.41 0.00
> nvme4n1 49981.00 0.00 195.24 0.00
>
>
> Device tps MB_read/s MB_wrtn/s MB_dscd/s
> nvme4c4n1 0.00 0.00 0.00 0.00
> nvme4c5n1 50001.00 0.00 195.32 0.00
> nvme4c9n1 0.00 0.00 0.00 0.00
> nvme4n1 50016.00 0.00 195.38 0.00
>
> Link: https://people.redhat.com/jmeneghi/LSFMM_2024/LSFMM_2024_NVMe_Cancel_and_FPIN.pdf
>
> More rigorous testing was also performed to ensure proper path migration
> on each of the eight different FPIN link integrity events, particularly
> during a scenario where there are only non-optimized paths available, in
> a state where all paths are marginal. On a configuration with a
> round-robin iopolicy, when all paths on the host show as marginal, I/O
> continues on the optimized path that was most recently non-marginal.
> From this point, of both of the optimized paths are down, I/O properly
> continues on the remaining paths.
>
> Changes to the original submission:
> - Changed flag name to 'marginal'
> - Do not block marginal path; influence path selection instead
> to de-prioritize marginal paths
>
> Changes to v2:
> - Split off driver-specific modifications
> - Introduce 'union fc_tlv_desc' to avoid casts
>
> Changes to v3:
> - Include reviews from Justin Tee
> - Split marginal path handling patch
>
> Changes to v4:
> - Change 'u8' to '__u8' on fc_tlv_desc to fix a failure to build
> - Print 'marginal' instead of 'live' in the state of controllers
> when they are marginal
>
> Changes to v5:
> - Minor spelling corrections to patch descriptions
>
> Changes to v6:
> - No code changes; added note about additional testing
>
> Hannes Reinecke (5):
> fc_els: use 'union fc_tlv_desc'
> nvme-fc: marginal path handling
> nvme-fc: nvme_fc_fpin_rcv() callback
> lpfc: enable FPIN notification for NVMe
> qla2xxx: enable FPIN notification for NVMe
>
> Bryan Gurney (1):
> nvme: sysfs: emit the marginal path state in show_state()
>
> drivers/nvme/host/core.c | 1 +
> drivers/nvme/host/fc.c | 99 +++++++++++++++++++
> drivers/nvme/host/multipath.c | 17 ++--
> drivers/nvme/host/nvme.h | 6 ++
> drivers/nvme/host/sysfs.c | 4 +-
> drivers/scsi/lpfc/lpfc_els.c | 84 ++++++++--------
> drivers/scsi/qla2xxx/qla_isr.c | 3 +
> drivers/scsi/scsi_transport_fc.c | 27 +++--
> include/linux/nvme-fc-driver.h | 3 +
> include/uapi/scsi/fc/fc_els.h | 165 +++++++++++++++++--------------
> 10 files changed, 269 insertions(+), 140 deletions(-)
>
> --
> 2.49.0
>
We're going to be working on follow-up patches to address some things
that I found in additional testing:
During path fail testing on the numa iopolicy, I found that I/O moves
off of the marginal path after a first link integrity event is
received, but if the non-marginal path the I/O is on is disconnected,
the I/O is transferred onto a marginal path (in testing, sometimes
I've seen it go to a "marginal optimized" path, and sometimes
"marginal non-optimized").
The queue-depth iopolicy doesn't change its path selection based on
the marginal flag, but looking at nvme_queue_depth_path(), I can see
that there's currently no logic to handle marginal paths. We're
developing a patch to address that issue in queue-depth, but we need
to do more testing.
Thanks,
Bryan
More information about the Linux-nvme
mailing list