[PATCH RFC] nvme-fc: FPIN link integrity handling

Hannes Reinecke hare at suse.de
Thu Mar 7 03:29:39 PST 2024


On 3/7/24 11:10, Sagi Grimberg wrote:
> 
> 
> On 19/02/2024 10:59, hare at kernel.org wrote:
>> From: Hannes Reinecke <hare at suse.de>
>>
>> FPIN LI (link integrity) messages are received when the attached
>> fabric detects hardware errors. In response to these messages the
>> affected ports should not be used for I/O, and only put back into
>> service once the ports had been reset as then the hardware might
>> have been replaced.
> 
> Does this mean it cannot service any type of communication over
> the wire?
> 
It means that the service is impacted, and communication cannot be 
guaranteed (CRC errors, packet loss, you name it).
So the link should be taken out of service until it's been (manually)
replaced.

>> This patch adds a new controller flag 'NVME_CTRL_TRANSPORT_BLOCKED'
>> which will be checked during multipath path selection, causing the
>> path to be skipped.
> 
> While this looks sensible to me, it also looks like this introduces a 
> ctrl state
> outside of ctrl->state... Wouldn't it make sense to move the controller to
> NVME_CTRL_DEAD ? or is it not a terminal state?
> 
Actually, I was trying to model it alongside the 
'devloss_tmo'/'fast_io_fail' mechanism we have in SCSI.
Technically the controller is still present, it's just that we shouldn't
send I/O to it. And I'd rather not disconnect here as we're trying to
do an autoconnect on FC, so manually disconnect would interfere with
that and we probably end in a death spiral doing disconnect/reconnect.

We could 'elevate' it to a new controller state, but wasn't sure how big
an appetite there is. And we already have flags like 'stopped' which
seem to fall into the same category.

So I'd rather not touch the state machine.

Cheers,

Hannes




More information about the Linux-nvme mailing list