5.10.40-1 - Invalid SGL for payload:131072 nents:13

Sun Jul 25 02:46:20 PDT 2021

Hi Ming Lei,

On Sat, Jul 24, 2021 at 10:46:53AM +0800, Ming Lei wrote:
> On Tue, Jul 20, 2021 at 10:07:33PM +0000, Andy Smith wrote:
> > Hi,
> > 
> > I have a Debian stable machine with a Samsung PM983 NVMe and a
> > Samsung SM883 in an MD RAID-1. It's been running the 4.19.x Debian
> > packaged kernel for almost 2 years now.
> > 
> > About 24 hours ago I upgraded its kernel to the buster-backports
> > kernel which is version 5.10.40-1~bpo10+1 and around four hours
> > after that I got this:
> > 
> > Jul 20 02:17:54 lamb kernel: [21061.388607] sg[0] phys_addr:0x00000015eb803000 offset:0 length:4096 dma_address:0x000000209e7b7000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.389775] sg[1] phys_addr:0x00000015eb7bc000 offset:0 length:4096 dma_address:0x000000209e7b8000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.390874] sg[2] phys_addr:0x00000015eb809000 offset:0 length:4096 dma_address:0x000000209e7b9000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.391974] sg[3] phys_addr:0x00000015eb766000 offset:0 length:4096 dma_address:0x000000209e7ba000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.393042] sg[4] phys_addr:0x00000015eb7a3000 offset:0 length:4096 dma_address:0x000000209e7bb000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.394086] sg[5] phys_addr:0x00000015eb7c6000 offset:0 length:4096 dma_address:0x000000209e7bc000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.395078] sg[6] phys_addr:0x00000015eb7c2000 offset:0 length:4096 dma_address:0x000000209e7bd000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.396042] sg[7] phys_addr:0x00000015eb7a9000 offset:0 length:4096 dma_address:0x000000209e7be000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.397004] sg[8] phys_addr:0x00000015eb775000 offset:0 length:4096 dma_address:0x000000209e7bf000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.397971] sg[9] phys_addr:0x00000015eb7c7000 offset:0 length:4096 dma_address:0x00000020ff520000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.398889] sg[10] phys_addr:0x00000015eb7cb000 offset:0 length:4096 dma_address:0x00000020ff521000 dma_length:4096
> > Jul 20 02:17:54 lamb kernel: [21061.399814] sg[11] phys_addr:0x00000015eb7e3000 offset:0 length:61952 dma_address:0x00000020ff522000 dma_length:61952
> > Jul 20 02:17:54 lamb kernel: [21061.400754] sg[12] phys_addr:0x00000015eb7f2200 offset:512 length:24064 dma_address:0x00000020ff531200 dma_length:24064
> 
> The last two segments are physically continuous, which should have been
> in same segment, otherwise virt boundary limit may be violated.
> 
> But __blk_bvec_map_sg() doesn't make the two into one segment, can you
> collect the queue limit log?
> 
> (cd /sys/block/$NVME/queue && find . -type f -exec grep -aH . {} \;)

$ (cd /sys/block/nvme0n1/queue && find . -type f -exec grep -aH . {} \;)
./io_poll_delay:-1
./max_integrity_segments:0
./zoned:none
./scheduler:[none] mq-deadline 
./io_poll:0
./discard_zeroes_data:0
./minimum_io_size:512
./nr_zones:0
./write_same_max_bytes:0
./max_segments:127
./dax:0
./physical_block_size:512
./logical_block_size:512
./zone_append_max_bytes:0
./io_timeout:30000
./nr_requests:1023
./write_cache:write through
./stable_writes:0
./max_segment_size:4294967295
./rotational:0
./discard_max_bytes:2199023255040
./add_random:0
./discard_max_hw_bytes:2199023255040
./optimal_io_size:0
./chunk_sectors:0
./read_ahead_kb:128
./max_discard_segments:256
./write_zeroes_max_bytes:2097152
./nomerges:0
./wbt_lat_usec:2000
./fua:0
./discard_granularity:512
./rq_affinity:1
./max_sectors_kb:1280
./hw_sector_size:512
./max_hw_sectors_kb:2048
./iostats:1

> Meantime can you try the following patch and see if it can make a
> difference?
> 
> commit c9c9762d4d44dcb1b2ba90cfb4122dc11ceebf31
> Author: Long Li <longli at microsoft.com>
> Date:   Mon Jun 7 12:34:05 2021 -0700
> 
>     block: return the correct bvec when checking for gaps

I applied this patch to 5.10.40 and I am no longer able to reproduce
the issue, thanks!

I see this patch made it in to 5.10.50, but Debian's
buster-backports kernel is based on 5.10.40, the forthcoming
bullseye is 5.10.46 and neither have this change. I will enquire
about backporting.

This is the fio command line I am using to reproduce:

fio --name=randread \
    --filename=/srv/fio/test \
    --size=35g \
    --numjobs=1 \
    --rw=randread \
    --direct=1 \
    --ioengine=libaio \
    --iodepth=16 \
    --blocksize_range=4k-4m \
    --blocksize_unaligned=0 \
    --gtod_reduce=1 \
    --iodepth=64 \
    --time_based \
    --runtime=4h

Where /srv/fio/ is an ext4 filesystem that is the first partition of
a block device deliberately misaligned by starting at sector 63.

I haven't been able to reproduce without this misalignment, but with
it I can reproduce within a few seconds. I've run it for several
hours (read 17TB off the device) with
c9c9762d4d44dcb1b2ba90cfb4122dc11ceebf31 and had no issues.

Thanks,
Andy