max_hw_sectors error caused by recent NVMe driver commit

Daniel Gomez dagmcr at gmail.com
Sat Feb 11 01:38:44 PST 2023


Hi all,

On Sat, Feb 11, 2023 at 6:38 AM Michael Kelley (LINUX)
<mikelley at microsoft.com> wrote:
>
> Commit 3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
> appears to have introduced an error in how max_hw_sectors is calculated.  The
> value of max_hw_sectors is based on dma_max_mapping_size(), which indirectly
> uses dma_addressing_limited() to decide if swiotlb_max_mapping_size() should
> be used.
>
> In this commit, setting max_hw_sectors is moved to nvme_pci_alloc_dev().
> But dma_addressing_limited() depends on the dev->dma_mask, which hasn't
> been set.  dma_addressing_limited() returns "true", and the swiotlb max mapping
> size is used, limiting NVMe transfers to 504 sectors (252 Kbytes).
>
> Prior to this commit, max_hw_sectors isn't set until after the call to
> dma_set_mask_and_coherent() in nvme_pci_enable(), as called from
> nvme_reset_work().   max_hw_sectors is correctly determined based on
> values reported by the NVMe controller.
>
> I haven't provided a fix because I'm not that familiar with the overall structure
> of the code and the intent of the code reorganization.  I'm not sure if setting
> the DMA mask should be moved earlier, or setting max_hw_sectors should
> be moved back to its original location.

Yesterday, I ran into the same problem. Below you can find my test
case. I'm trying to get familiar with the code, so any help would be
much appreciated.

I could reproduce it with a simple dd test [1] and btrace. In summary,
multi-page bvec splits are performed in 4 steps/chunks of 63 segments
(63 * 4K / 512 = 504 sectors) and 1 step of 8 segments (8 * 4K / 512 =
32 sectors). But I think it should be just 4 steps of 64 segments as
that is the max_segments upper limit in my test case. Below you can
find my test case and numbers:

Check:
root at qemuarm64:/sys/block/nvme0n1/queue# grep -H . max_sectors_kb
max_hw_sectors_kb max_segments max_segment_size optimal_io_size
logical_block_size chunk_sectors
max_sectors_kb:252
max_hw_sectors_kb:252
max_segments:64
max_segment_size:4294967295
optimal_io_size:512
logical_block_size:512
chunk_sectors:0

dd test:
dd iflag=direct if=/dev/nvme0n1 bs=1M of=/dev/null status=progress

btrace snippet:
259,0    0    23476     1.111774467   751  Q  RS 1923072 + 2048 [dd]
259,0    0    23477     1.111774842   751  X  RS 1923072 / 1923576 [dd]
259,0    0    23478     1.111775342   751  G  RS 1923072 + 504 [dd]
259,0    0    23479     1.111775425   751  I  RS 1923072 + 504 [dd]
259,0    0    23480     1.111776133   751  X  RS 1923576 / 1924080 [dd]
259,0    0    23481     1.111776258   751  G  RS 1923576 + 504 [dd]
259,0    0    23482     1.111776300   751  I  RS 1923576 + 504 [dd]
259,0    0    23483     1.111776675   751  X  RS 1924080 / 1924584 [dd]
259,0    0    23484     1.111776967   751  G  RS 1924080 + 504 [dd]
259,0    0    23485     1.111777008   751  I  RS 1924080 + 504 [dd]
259,0    0    23486     1.111777383   751  X  RS 1924584 / 1925088 [dd]
259,0    0    23487     1.111777467   751  G  RS 1924584 + 504 [dd]
259,0    0    23488     1.111777550   751  I  RS 1924584 + 504 [dd]
259,0    0    23489     1.111777758   751  G  RS 1925088 + 32 [dd]
259,0    0    23490     1.111777800   751  I  RS 1925088 + 32 [dd]
259,0    0    23491     1.111779383    36  D  RS 1923072 + 504 [kworker/0:1H]
259,0    0    23492     1.111780092    36  D  RS 1923576 + 504 [kworker/0:1H]
259,0    0    23493     1.111780800    36  D  RS 1924080 + 504 [kworker/0:1H]
259,0    0    23494     1.111781425    36  D  RS 1924584 + 504 [kworker/0:1H]
259,0    0    23495     1.111781717    36  D  RS 1925088 + 32 [kworker/0:1H]
259,0    0    23496     1.112201967   749  C  RS 1923072 + 504 [0]
259,0    0    23497     1.112563925   749  C  RS 1923072 + 2048 [0]
259,0    0    23498     1.112564425   749  C  RS 1923072 + 2016 [0]
259,0    0    23499     1.112564800   749  C  RS 1923072 + 1512 [0]
259,0    0    23500     1.112758217   749  C  RS 1923072 + 1008 [0]

In addition, going back to Michaels reference commit and running the
test in the same setup, I got the following checks:

max_sectors_kb:512
max_hw_sectors_kb:512
max_segments:127
max_segment_size:4294967295
optimal_io_size:512
logical_block_size:512

So, I'm not sure why my max_segments when from 64 -> 127 and,
max_sectors_kb and max_hw_sectdors_kb from 252 -> 512. Perhaps my
first assumption was wrong?

[1] test:
https://unix.stackexchange.com/questions/529529/why-is-the-size-of-my-io-requests-being-limited-to-about-512k

Daniel

>
> Michael
>
>
>



More information about the Linux-nvme mailing list