[PATCH v7 0/6] nvme-fc: FPIN link integrity handling

Bryan Gurney bgurney at redhat.com
Tue Jul 1 13:32:26 PDT 2025


On Tue, Jun 24, 2025 at 4:20 PM Bryan Gurney <bgurney at redhat.com> wrote:
>
> FPIN LI (link integrity) messages are received when the attached
> fabric detects hardware errors. In response to these messages I/O
> should be directed away from the affected ports, and only used
> if the 'optimized' paths are unavailable.
> Upon port reset the paths should be put back in service as the
> affected hardware might have been replaced.
> This patch adds a new controller flag 'NVME_CTRL_MARGINAL'
> which will be checked during multipath path selection, causing the
> path to be skipped when checking for 'optimized' paths. If no
> optimized paths are available the 'marginal' paths are considered
> for path selection alongside the 'non-optimized' paths.
> It also introduces a new nvme-fc callback 'nvme_fc_fpin_rcv()' to
> evaluate the FPIN LI TLV payload and set the 'marginal' state on
> all affected rports.
>
> The testing for this patch set was performed by Bryan Gurney, using the
> process outlined by John Meneghini's presentation at LSFMM 2024, where
> the fibre channel switch sends an FPIN notification on a specific switch
> port, and the following is checked on the initiator:
>
> 1. The controllers corresponding to the paths on the port that has
> received the notification are showing a set NVME_CTRL_MARGINAL flag.
>
>    \
>     +- nvme4 fc traddr=c,host_traddr=e live optimized
>     +- nvme5 fc traddr=8,host_traddr=e live non-optimized
>     +- nvme8 fc traddr=e,host_traddr=f marginal optimized
>     +- nvme9 fc traddr=a,host_traddr=f marginal non-optimized
>
> 2. The I/O statistics of the test namespace show no I/O activity on the
> controllers with NVME_CTRL_MARGINAL set.
>
>    Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s
>    nvme4c4n1         0.00         0.00         0.00         0.00
>    nvme4c5n1     25001.00         0.00        97.66         0.00
>    nvme4c9n1     25000.00         0.00        97.66         0.00
>    nvme4n1       50011.00         0.00       195.36         0.00
>
>
>    Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s
>    nvme4c4n1         0.00         0.00         0.00         0.00
>    nvme4c5n1     48360.00         0.00       188.91         0.00
>    nvme4c9n1      1642.00         0.00         6.41         0.00
>    nvme4n1       49981.00         0.00       195.24         0.00
>
>
>    Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s
>    nvme4c4n1         0.00         0.00         0.00         0.00
>    nvme4c5n1     50001.00         0.00       195.32         0.00
>    nvme4c9n1         0.00         0.00         0.00         0.00
>    nvme4n1       50016.00         0.00       195.38         0.00
>
> Link: https://people.redhat.com/jmeneghi/LSFMM_2024/LSFMM_2024_NVMe_Cancel_and_FPIN.pdf
>
> More rigorous testing was also performed to ensure proper path migration
> on each of the eight different FPIN link integrity events, particularly
> during a scenario where there are only non-optimized paths available, in
> a state where all paths are marginal.  On a configuration with a
> round-robin iopolicy, when all paths on the host show as marginal, I/O
> continues on the optimized path that was most recently non-marginal.
> From this point, of both of the optimized paths are down, I/O properly
> continues on the remaining paths.
>
> Changes to the original submission:
> - Changed flag name to 'marginal'
> - Do not block marginal path; influence path selection instead
>   to de-prioritize marginal paths
>
> Changes to v2:
> - Split off driver-specific modifications
> - Introduce 'union fc_tlv_desc' to avoid casts
>
> Changes to v3:
> - Include reviews from Justin Tee
> - Split marginal path handling patch
>
> Changes to v4:
> - Change 'u8' to '__u8' on fc_tlv_desc to fix a failure to build
> - Print 'marginal' instead of 'live' in the state of controllers
>   when they are marginal
>
> Changes to v5:
> - Minor spelling corrections to patch descriptions
>
> Changes to v6:
> - No code changes; added note about additional testing
>
> Hannes Reinecke (5):
>   fc_els: use 'union fc_tlv_desc'
>   nvme-fc: marginal path handling
>   nvme-fc: nvme_fc_fpin_rcv() callback
>   lpfc: enable FPIN notification for NVMe
>   qla2xxx: enable FPIN notification for NVMe
>
> Bryan Gurney (1):
>   nvme: sysfs: emit the marginal path state in show_state()
>
>  drivers/nvme/host/core.c         |   1 +
>  drivers/nvme/host/fc.c           |  99 +++++++++++++++++++
>  drivers/nvme/host/multipath.c    |  17 ++--
>  drivers/nvme/host/nvme.h         |   6 ++
>  drivers/nvme/host/sysfs.c        |   4 +-
>  drivers/scsi/lpfc/lpfc_els.c     |  84 ++++++++--------
>  drivers/scsi/qla2xxx/qla_isr.c   |   3 +
>  drivers/scsi/scsi_transport_fc.c |  27 +++--
>  include/linux/nvme-fc-driver.h   |   3 +
>  include/uapi/scsi/fc/fc_els.h    | 165 +++++++++++++++++--------------
>  10 files changed, 269 insertions(+), 140 deletions(-)
>
> --
> 2.49.0
>


We're going to be working on follow-up patches to address some things
that I found in additional testing:

During path fail testing on the numa iopolicy, I found that I/O moves
off of the marginal path after a first link integrity event is
received, but if the non-marginal path the I/O is on is disconnected,
the I/O is transferred onto a marginal path (in testing, sometimes
I've seen it go to a "marginal optimized" path, and sometimes
"marginal non-optimized").

The queue-depth iopolicy doesn't change its path selection based on
the marginal flag, but looking at nvme_queue_depth_path(), I can see
that there's currently no logic to handle marginal paths.  We're
developing a patch to address that issue in queue-depth, but we need
to do more testing.


Thanks,

Bryan




More information about the Linux-nvme mailing list