[Regression] Bug 216400 - Firmware activation starting AEN processing prevents further AER commands sent to the NVMe controller.

Sagi Grimberg sagi at grimberg.me
Mon Aug 29 02:14:21 PDT 2022



On 8/26/22 15:19, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported. That's why I decided to forward it by
> mail to those that afaics should handle this.
> 
> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=216400 :
> 
>>   lixingyuan 2022-08-23 01:14:50 UTC
>>
>> This bug is related to these two commits:
>>
>> 1. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=4c75f877853cfa81b12374a07208e07b077f39b8
>>
>> These codes will set the controller state to NVME_CTRL_RESETTING while handling the firmware activation staring AEN
>>
>> 2. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=0fa0f99fc84e41057cbdd2efbfe91c6b2f47dd9d
>>
>> When submitting a new AER command to the controller, this code checks if the controller state is NVME_CTRL_LIVE. This caused the problem. When the firmware activation staring AEN was processed before, the controller state was already set to NVME_CTRL_RESETTING, which resulted in no new AER commands being sent to the controller.

I see.

I can modify this code to check in the drivers instead of the core.

Keith, pci does not risk submitting an async event on a freed admin
queue? if not, I can add a proper check there as well...



More information about the Linux-nvme mailing list