[BUG,REGRESSION] SATA regression on 12.0-rc4 kernel

Marc (Marc-Angelo) Carino mcarino at broadcom.com
Wed Oct 9 11:22:48 EDT 2013


Hello all,

>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.

Drive misreporting is likely to be the case. Oddly enough, even if the
drive's firmware did misreport support for the new SEND/RECV commands, it
appears that a discard/trim request is being made by the block layer.

In addition to the hdparm dump, could you also provide a full kernel boot
log? The driver should complain if there were any issues retrieving the NCQ
send/receive log page.

Lastly, could you give another drive brand a try, if possible? I had tested
the changes on an Intel SATA AHCI controller and a Micron M500 SSD. I should
be able to scrounge up a Marvell PCIe AHCI controller.

Thanks!
Marc

On 10/8/2013 10:51 PM, Robert Hancock wrote:
> On Tue, Oct 8, 2013 at 12:10 AM, Arnaud Ebalard <arno at natisbad.org> wrote:
>> Hi Robert,
>>
>> Robert Hancock <hancockrwd at gmail.com> writes:
>>
>>> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>>>> Hi guys,
>>>>
>>>> yesterday, I reported on arm kernel mailing list what looked like a sata
>>>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>>>> 102). I initially thought this was an ARM-related issue. My initial
>>>> email, provided below, contains various details on the platform and the
>>>> error encountered.
>>>>
>>>> Today, before starting a painful git bisect, I decided to git log
>>>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>>>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>>>> against which I got suspicious after looking again at the errors I had:
>>>>
>>>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>>>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>>>> [  417.315896] ata1.00: status: { DRDY }
>>>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>>>> [  417.339619] ata1.00: status: { DRDY }
>>>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>>>> [  417.363341] ata1.00: status: { DRDY }
>>>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>>>> [  417.387061] ata1.00: status: { DRDY }
>>>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>>>> [  417.410782] ata1.00: status: { DRDY }
>>>>
>>>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>>>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>>>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>>>> compile the kernel with only the latter reverted.
>>>>
>>>> If you need more info on the platform or want me to test something some
>>>> fix, do not hesitate.
>>>
>>> I assume that it consistently fails on a non-working kernel and
>>> consistently works with those patches reverted? Given that both of
>>> those patches seem to only be touching SSDs with NCQ trim support, it
>>> seems odd they would be breaking a normal hard drive, but maybe there
>>> is some unexpected side effect..
>>
>> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
>> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
>> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
>> i.e. I just need to perform some disk operations. With the two commits
>> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
>> '{}' \;" w/o anything happening.
>>
>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.
> 
> The commands that are failing are WRITE FPDMA QUEUED which is a
> regular NCQ write command. The ones that these commits add support for
> are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.
> 
> It's possible that the feature detection for this is picking up
> support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
> you post the output of "hdparm --Istdout /dev/sdX" for one of these
> drives (where X matches the drive in question)?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 





More information about the linux-arm-kernel mailing list