[PATCH] nvme-pci: fix host memory buffer allocation size
Thomas Weißschuh
linux at weissschuh.net
Tue May 10 03:20:17 PDT 2022
On 2022-05-10 09:03+0200, Christoph Hellwig wrote:
> On Thu, Apr 28, 2022 at 06:09:11PM +0200, Thomas Weißschuh wrote:
> > > > On my hardware we start with a chunk_size of 4MiB and just allocate
> > > > 8 (hmmaxd) * 4 = 32 MiB which is worse than 1 * 200MiB.
> > >
> > > And that is because the hardware only has a limited set of descriptors.
> >
> > Wouldn't it make more sense then to allocate as much memory as possible for
> > each descriptor that is available?
> >
> > The comment in nvme_alloc_host_mem() tries to "start big".
> > But it actually starts with at most 4MiB.
>
> Compared to what other operating systems offer, that is quite large.
Ok. I only looked at FreeBSD, which uses up to 5% of total memory per
device. [0]
> > And on devices that have hmminds > 4MiB the loop condition will never succeed
> > at all and HMB will not be used.
> > My fairly boring hardware already is at a hmminds of 3.3MiB.
> >
> > > Is there any real problem you are fixing with this? Do you actually
> > > see a performance difference on a relevant workload?
> >
> > I don't have a concrete problem or performance issue.
> > During some debugging I stumbled in my kernel logs upon
> > "nvme nvme0: allocated 32 MiB host memory buffer"
> > and investigated why it was so low.
>
> Until recently we could not even support these large sizes at all on
> typical x86 configs. With my fairly recent change to allow vmap
> remapped iommu allocations on x86 we can do that now. But if we
> unconditionally enabled it I'd be a little worried about using too
> much memory very easily.
This should still be limited to max_host_mem_size_mb which defaults to 128MiB,
or?
> We could look into removing the min with
> PAGE_SIZE * MAX_ORDER_NR_PAGES to try to do larger segments for
> "segment challenged" controllers now that it could work on a lot
> of iommu enabled setups. But I'd rather have a very good reason for
> that.
On my current setup (WD SN770 on ThinkPad X1 Carbon Gen9) frequently the NVME
controller stops responding. Switching from no scheduler to mq-deadline reduced
this but did not eliminate it.
Since switching to HMB of 1 * 200MiB and no scheduler this did not happen anymore.
(But I'll need some more time to gain real confidence in this)
Initially I assumed that the PAGE_SIZE * MAX_ORDER_NR_PAGES was indeed
meant as a minimum for DMA allocation.
As that is not the case, removing the min() completely instead of the max() I
proposed would obviously be the correct thing to do.
[0] https://manpages.debian.org/testing/freebsd-manpages/nvme.4freebsd.en.html
More information about the Linux-nvme
mailing list