[PATCH] nvme: mark ctrl as DEAD if removing from error recovery

Ming Lei ming.lei at redhat.com
Sun Jul 9 20:02:25 PDT 2023


On Sun, Jul 09, 2023 at 10:38:29AM +0300, Sagi Grimberg wrote:
> 
> > > namespace's request queue is frozen and quiesced during error recovering,
> > > writeback IO is blocked in bio_queue_enter(), so fsync_bdev() <-
> > > del_gendisk()
> > > can't move on, and causes IO hang. Removal could be from sysfs, hard
> > > unplug or error handling.
> > > 
> > > Fix this kind of issue by marking controller as DEAD if removal breaks
> > > error recovery.
> > > 
> > > This ways is reasonable too, because controller can't be recovered any
> > > more after being removed.
> > 
> > This looks fine to me Ming,
> > Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> > 
> > 
> > I still want your patches for tcp/rdma that move the freeze.
> > If you are not planning to send them, I swear I will :)
> 
> Ming, can you please send the tcp/rdma patches that move the
> freeze? As I said before, it addresses an existing issue with
> requests unnecessarily blocked on a frozen queue instead of
> failing over.

Any chance to fix the current issue in one easy(backportable) way[1] first?

All previous discussions on delay freeze[2] are generic, which apply on all
nvme drivers, not mention this error handling difference causes extra maintain
burden. I still suggest to convert all drivers in same way, and will work
along the approach[1] aiming for v6.6.


[1] https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@grimberg.me/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc

Thanks,
Ming




More information about the Linux-nvme mailing list