max_hw_sectors error caused by recent NVMe driver commit
Daniel Gomez
dagmcr at gmail.com
Sat Feb 11 01:38:44 PST 2023
Hi all,
On Sat, Feb 11, 2023 at 6:38 AM Michael Kelley (LINUX)
<mikelley at microsoft.com> wrote:
>
> Commit 3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
> appears to have introduced an error in how max_hw_sectors is calculated. The
> value of max_hw_sectors is based on dma_max_mapping_size(), which indirectly
> uses dma_addressing_limited() to decide if swiotlb_max_mapping_size() should
> be used.
>
> In this commit, setting max_hw_sectors is moved to nvme_pci_alloc_dev().
> But dma_addressing_limited() depends on the dev->dma_mask, which hasn't
> been set. dma_addressing_limited() returns "true", and the swiotlb max mapping
> size is used, limiting NVMe transfers to 504 sectors (252 Kbytes).
>
> Prior to this commit, max_hw_sectors isn't set until after the call to
> dma_set_mask_and_coherent() in nvme_pci_enable(), as called from
> nvme_reset_work(). max_hw_sectors is correctly determined based on
> values reported by the NVMe controller.
>
> I haven't provided a fix because I'm not that familiar with the overall structure
> of the code and the intent of the code reorganization. I'm not sure if setting
> the DMA mask should be moved earlier, or setting max_hw_sectors should
> be moved back to its original location.
Yesterday, I ran into the same problem. Below you can find my test
case. I'm trying to get familiar with the code, so any help would be
much appreciated.
I could reproduce it with a simple dd test [1] and btrace. In summary,
multi-page bvec splits are performed in 4 steps/chunks of 63 segments
(63 * 4K / 512 = 504 sectors) and 1 step of 8 segments (8 * 4K / 512 =
32 sectors). But I think it should be just 4 steps of 64 segments as
that is the max_segments upper limit in my test case. Below you can
find my test case and numbers:
Check:
root at qemuarm64:/sys/block/nvme0n1/queue# grep -H . max_sectors_kb
max_hw_sectors_kb max_segments max_segment_size optimal_io_size
logical_block_size chunk_sectors
max_sectors_kb:252
max_hw_sectors_kb:252
max_segments:64
max_segment_size:4294967295
optimal_io_size:512
logical_block_size:512
chunk_sectors:0
dd test:
dd iflag=direct if=/dev/nvme0n1 bs=1M of=/dev/null status=progress
btrace snippet:
259,0 0 23476 1.111774467 751 Q RS 1923072 + 2048 [dd]
259,0 0 23477 1.111774842 751 X RS 1923072 / 1923576 [dd]
259,0 0 23478 1.111775342 751 G RS 1923072 + 504 [dd]
259,0 0 23479 1.111775425 751 I RS 1923072 + 504 [dd]
259,0 0 23480 1.111776133 751 X RS 1923576 / 1924080 [dd]
259,0 0 23481 1.111776258 751 G RS 1923576 + 504 [dd]
259,0 0 23482 1.111776300 751 I RS 1923576 + 504 [dd]
259,0 0 23483 1.111776675 751 X RS 1924080 / 1924584 [dd]
259,0 0 23484 1.111776967 751 G RS 1924080 + 504 [dd]
259,0 0 23485 1.111777008 751 I RS 1924080 + 504 [dd]
259,0 0 23486 1.111777383 751 X RS 1924584 / 1925088 [dd]
259,0 0 23487 1.111777467 751 G RS 1924584 + 504 [dd]
259,0 0 23488 1.111777550 751 I RS 1924584 + 504 [dd]
259,0 0 23489 1.111777758 751 G RS 1925088 + 32 [dd]
259,0 0 23490 1.111777800 751 I RS 1925088 + 32 [dd]
259,0 0 23491 1.111779383 36 D RS 1923072 + 504 [kworker/0:1H]
259,0 0 23492 1.111780092 36 D RS 1923576 + 504 [kworker/0:1H]
259,0 0 23493 1.111780800 36 D RS 1924080 + 504 [kworker/0:1H]
259,0 0 23494 1.111781425 36 D RS 1924584 + 504 [kworker/0:1H]
259,0 0 23495 1.111781717 36 D RS 1925088 + 32 [kworker/0:1H]
259,0 0 23496 1.112201967 749 C RS 1923072 + 504 [0]
259,0 0 23497 1.112563925 749 C RS 1923072 + 2048 [0]
259,0 0 23498 1.112564425 749 C RS 1923072 + 2016 [0]
259,0 0 23499 1.112564800 749 C RS 1923072 + 1512 [0]
259,0 0 23500 1.112758217 749 C RS 1923072 + 1008 [0]
In addition, going back to Michaels reference commit and running the
test in the same setup, I got the following checks:
max_sectors_kb:512
max_hw_sectors_kb:512
max_segments:127
max_segment_size:4294967295
optimal_io_size:512
logical_block_size:512
So, I'm not sure why my max_segments when from 64 -> 127 and,
max_sectors_kb and max_hw_sectdors_kb from 252 -> 512. Perhaps my
first assumption was wrong?
[1] test:
https://unix.stackexchange.com/questions/529529/why-is-the-size-of-my-io-requests-being-limited-to-about-512k
Daniel
>
> Michael
>
>
>
More information about the Linux-nvme
mailing list