[PATCH] nvme: find numa distance only if controller has valid numa id

Keith Busch kbusch at kernel.org
Mon Apr 15 09:56:07 PDT 2024


On Mon, Apr 15, 2024 at 04:39:45PM +0200, Hannes Reinecke wrote:
> > For calculating the distance between two nodes we invoke the function __node_distance().
> > This function would then access the numa distance table, which is typically an array with
> > valid index starting from 0. So obviously accessing this table with index of -1 would
> > deference incorrect memory location. De-referencing incorrect memory location might have
> > side effects including panic (though I didn't encounter panic). Furthermore in such a case,
> > the calculated node distance could potentially be incorrect and that might cause the nvme
> > multipath to choose a suboptimal IO path.
> > 
> > This patch may not help choosing the optimal IO path (as we assume that the node distance would be
> > LOCAL_DISTANCE in case nvme controller numa node id is -1) but it ensures that we don't access the
> > invalid memory location for calculating node distance.
> > 
> Hmm. One wonders: how does such a system work?
> The systems I know always have the PCI slots attached to the CPU
> sockets, so if the CPU is not present the NVMe device on that
> slot will be non-functional. In fact, it wouldn't be visible at
> all as the PCI lanes are not powered up.
> In your system the PCI lanes clearly are powered up, as the NVMe
> device shows up in the PCI enumeration.
> Which means you are running a rather different PCI configuration.
> Question now is: does the NVMe device _work_?
> If it does, shouldn't the NUMA node continue to be present (some kind of
> memory-less, CPU-less NUMA node ...)?
> As a side-note, we'll need these kind of configuration anyway once
> CXL switches become available...

I recall systems with IO controller attached in a shared manner to all
sockets, so memory is UMA from IO device perspecitve (it may still be
NUMA from CPU). I don't think you need to consider memory-only NUMA
nodes unless there are additional distances to consider (at which point
it's no longer UMA).



More information about the Linux-nvme mailing list