Resets during user commands leads to hung task and controller stuck in connecting

Keith Busch kbusch at kernel.org
Tue Nov 15 08:34:36 PST 2022


On Tue, Nov 15, 2022 at 09:46:45AM +0200, Sagi Grimberg wrote:
> 
> > > > I'm (again) seeing a hung task when doing resets and formats simultaneously.
> > > > Controller state is left in 'connecting'
> > > > 
> > > > Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
> > > > but I have also repro'd with Christoph's latest reset/probe-split set
> > > > 
> > > > 
> > > > ctrl="nvme0"
> > > > nsid=1
> > > > pci="/sys/block/${ctrl}n${nsid}/device/"
> > > > echo 30 > /proc/sys/kernel/hung_task_timeout_secs
> > > > while true; do
> > > >           nvme format -f /dev/${ctrl}n${nsid} &
> > > 
> > > How long to it take the format to complete?
> > Well it's pretty immediate but I'm under the impression that the
> > nvme_dev_disable path leads to CC_EN disabling, interrupting any formats
> > 
> > > 
> > > >           echo 1 > $pci/reset_controller &
> > > > done
> > > 
> > > What happens if you set io_timeout to 20 instead of 30? (given
> > > that you bound hung tasks at 30 seconds...
> > It occurs with the standard 120s task timeout too
> > Also there's no I/O occurring at the moment; just admin work
> > 
> > I added a blktests for this:
> > http://lists.infradead.org/pipermail/linux-nvme/2022-November/036475.html
> 
> Keith?
> 
> Is this related to bc8fb906b0ff ("nvme: handle effects after freeing the
> request") ?

Kind of. Jonathan previously reported an error with the same test, and
reported that the mentioned commit fixed it. This is yet another error
with the same test, but that commit doesn't appear to have been a factor
in this new observation.

This test could theoretically consume all admin tags with format
commands, and if the controller breaks on the format and stops
responding, then we don't have a tag available to tear down IO queues.
I'm not sure that's actually happening here, though. Is the sysfs reset
really the only stuck task reported in the kernel messages?



More information about the Linux-nvme mailing list