[PATCH 1/9] nvme-apple: fix controller shutdown in apple_nvme_disable

Linux kernel regression tracking (#adding) regressions at leemhuis.info
Wed Jan 11 01:30:17 PST 2023


[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

Top-posting one thing, to make this msg easier to consume.

Hector, thx for already working on this. If you respin your patches,
could you please include these tags in your patches:

Reported-by: Janne Grunau <j at jannau.net>
Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/

For details, see Documentation/process/submitting-patches.rst
(http://docs.kernel.org/process/submitting-patches.html) and
Documentation/process/5.Posting.rst
(https://docs.kernel.org/process/5.Posting.html)

Or these posts from Linus:

https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/

https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/

https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/


For the rest of this mail:

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 10.01.23 18:47, Janne Grunau wrote:
> On 2022-11-30 00:41:45 +0900, Hector Martin wrote:
>> On 29/11/2022 22.22, Christoph Hellwig wrote:
>>> nvme_shutdown_ctrl already shuts the controller down, there is no
>>> need to also call nvme_disable_ctrl for the shutdown case.
>>>
>>> Signed-off-by: Christoph Hellwig <hch at lst.de>
>>> ---
>>>  drivers/nvme/host/apple.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
>>> index 94ef797e8b4a5f..56d9e9be945b76 100644
>>> --- a/drivers/nvme/host/apple.c
>>> +++ b/drivers/nvme/host/apple.c
>>> @@ -831,7 +831,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
>>>  
>>>  		if (shutdown)
>>>  			nvme_shutdown_ctrl(&anv->ctrl);
>>> -		nvme_disable_ctrl(&anv->ctrl);
>>> +		else
>>> +			nvme_disable_ctrl(&anv->ctrl);
>>>  	}
>>>  
>>>  	WRITE_ONCE(anv->ioq.enabled, false);
>>
>> Reviewed-by: Hector Martin <marcan at marcan.st>
>>
>> I looked at some of our other implementations and we always seem to do
>> both, but this makes sense. If it breaks something we'll notice and
>> shout loudly when it makes it into an -rc at the latest :)
> 
> This breaks resume for the apple nvme controller.


> The immediate problem 
> is that apple_nvme_reset_work() uses NVME_CC_ENABLE to decide whether to 
> call apple_nvme_disable(). Only nvme_disable_ctrl(ctrl, false) clears 
> NVME_CC_ENABLE.
> 
> Even after extending the check to consider NVME_CC_SHN_NORMAL resume is 
> still broken. The host specific reset to re-enable the controller works 
> but IO is blocked. The controller logs following messages into its 
> syslog:
> 
> | RTKit: syslog message: cmd.c:728: Oldest tag 12 is 19 seconds VERY OLD with cmdQe:2
> | RTKit: syslog message: cmd.c:732: Oldest tag is about to crash!!
> | RTKit: syslog message: cmd.c:720: Oldest high water mark moved up to 21,tag 12, cmdQ CQ_PRI0W
> 
> This looks looks like the controller reset after/during shutdown is required
> on this controller.
> 
> Below patch fixes resume for me. I can send this with comment to
> prevent repeating this regression or if preferred handle this in
> nvme_disable_ctrl() either via a quirk or an additional parameter.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced c76b8308e4c9
#regzbot title nvme: resume broken on apple nvme controller
#regzbot monitor:
https://lore.kernel.org/all/20230111043614.27087-1-marcan@marcan.st/
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> --- 
> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> index e13a992b6096..82396f9486e1 100644
> --- a/drivers/nvme/host/apple.c
> +++ b/drivers/nvme/host/apple.c
> @@ -867,7 +867,9 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
> 			apple_nvme_remove_cq(anv);
> 		}
>  
> -               nvme_disable_ctrl(&anv->ctrl, shutdown);
> +               if (shutdown)
> +                       nvme_disable_ctrl(&anv->ctrl, true);
> +               nvme_disable_ctrl(&anv->ctrl, false);
> 	}
>  
> 	WRITE_ONCE(anv->ioq.enabled, false);



More information about the Linux-nvme mailing list