[PATCH 1/2] block: fix surprise removal for drivers calling blk_set_queue_dying
Markus Blöchl
Markus.Bloechl at ipetronik.com
Wed Feb 16 10:08:50 PST 2022
On Wed, Feb 16, 2022 at 06:09:19PM +0100, Christoph Hellwig wrote:
> > I might have missed something here, but assuming I am a driver which
> > employs multiple different queues, some with a disk attached to them,
> > some without (Is that possible? The admin queue e.g.?)
> > and I just lost my connection and want to notify everything below me
> > that their connection is dead.
> > Would I really want to kill disk queues differently from non-disk
> > queues?
>
> Yes. Things like the admin queue in nvme are under full control of
> the driver. While the "disk" queues just get I/O from the file system
> and thus need to be cut off.
>
> > How is the admin queue killed? Is it even?
>
> It isn't. We just stop submitting to it.
Ah, It is in nvme_dev_remove_admin() so as long as we don't get stuck
ourselves before we get there, things should be fine since other tasks
waiting for blk_queue_enter() only wait until nvme_remove() is done.
>
> > > --- a/drivers/block/mtip32xx/mtip32xx.c
> > > +++ b/drivers/block/mtip32xx/mtip32xx.c
> > > @@ -4112,7 +4112,7 @@ static void mtip_pci_remove(struct pci_dev *pdev)
> > > "Completion workers still active!\n");
> > > }
> > >
> > > - blk_set_queue_dying(dd->queue);
> > > + blk_mark_disk_dead(dd->disk);
> >
> > This driver is weird, I did find are reliably hint that dd->disk always
> > exists here. At least mtip_block_remove() has an extra check for that.
>
> The driver is a bit of a mess indeed, but the disk and queue will be
> non-NULL if ->probe returns successfully so this is fine. It is more
> that some of the checks are not required.
>
> > It also only set QUEUE_FLAG_DEAD if it detects a surprise removal and
> > not QUEUE_FLAG_DYING.
>
> Yes, this driver will need further work.
Alright, I more or less ignore this one for now, then.
I noticed that set_capacity() is also called most of the time when
a disk is killed. Should we also move that into blk_mark_disk_dead()?
Any reasons not to?
More information about the Linux-nvme
mailing list