[PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running FW 220TQ,22001
Gloria Tsai
Gloria.Tsai at ssstc.com
Mon Nov 2 21:21:16 EST 2020
Rephrased the problem description here,
When host issue shutdown + D3hot in suspend, NVMe drive might have chance choosing wrong pointer which has already been used by GC then cause over program.
Do GC before shutdown -> delete IO Q -> shutdown from host -> breakup GC -> D3hot -> enter PS4 -> have a chance swap block -> use wrong pointer on device SRAM -> over program
The issue only happens in simple suspend (shutdown+D3hot) with specific FW on Kahoku board.
Regards,
Gloria Tsai
_____________________________________
Sales PM Division
Solid State Storage Technology Corporation
TEL: +886-3-612-3888 ext. 2201
E-Mail: gloria.tsai at ssstc.com
_____________________________________
-----Original Message-----
From: Christoph Hellwig <hch at lst.de>
Sent: Tuesday, November 3, 2020 2:13 AM
To: Jongpil Jung <jongpuls at gmail.com>
Cc: Keith Busch <kbusch at kernel.org>; Jens Axboe <axboe at fb.com>; Christoph Hellwig <hch at lst.de>; Sagi Grimberg <sagi at grimberg.me>; linux-nvme at lists.infradead.org; linux-kernel at vger.kernel.org; Gloria Tsai <Gloria.Tsai at ssstc.com>; jongpil19.jung at samsung.com; jongheony.kim at samsung.com; dj54.sohn at samsung.com
Subject: Re: [PATCH V3 1/1] nvme: Add quirk for LiteON CL1 devices running FW 220TQ,22001
This message was sent from outside of the company. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe.
On Thu, Oct 29, 2020 at 03:55:29PM +0100, Christoph Hellwig wrote:
> I'm still worried about this.
>
> If power state based suspend does always work despite a HMB and is
> preferred for the specific Google board we should have purely a DMI
> based quirk for the board independent of the NVMe controller used with
> it.
>
> But if these LiteON devices can't properly handle nvme_dev_disable
> calls we have much deeper problems, because it can be called in all
> kinds of places, including suspending when not on this specific board.
>
> That being said, I still really do not understand this sentence and
> thus the problem at all:
>
> > When NVMe device receive D3hot from host, NVMe firmware will do
> > garbage collection. While NVMe device do Garbage collection,
> > firmware has chance to going incorrect address.
Any progress in describing the problem a little better?
More information about the Linux-nvme
mailing list