[PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next
Robin Murphy
robin.murphy at arm.com
Mon Feb 2 09:39:14 PST 2026
On 2026-02-02 5:13 pm, Keith Busch wrote:
> On Mon, Feb 02, 2026 at 03:16:50PM +0000, Robin Murphy wrote:
>> On 2026-02-02 2:35 pm, Christoph Hellwig wrote:
>>> On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
>>>> Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
>>>> when SWIOTLB bounce buffering becomes active during runtime.
>>>>
>>>> The issue occurs when SWIOTLB activation changes the device's DMA
>>>> mapping requirements at runtime,
>>>>
>>>> creating a mismatch between
>>>> iod->dma_vecs allocation and access logic.
>>>>
>>>> The problem manifests when:
>>>> 1. Device initially operates with dma_skip_sync=true
>>>> (coherent DMA assumed)
>>>> 2. First SWIOTLB mapping occurs due to DMA address limitations,
>>>> memory encryption, or IOMMU bounce buffering requirements
>>>> 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
>>>> dma_skip_sync=false
>>>> 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
>>>> iod->dma_vecs
>>>
>>> I think this patch just papers over the bug. If dma_need_unmap
>>> can't be trusted before the dma_map_* call, we've not saved
>>> the unmap information and the unmap won't work properly.
>>
>> The dma_need_unmap() kerneldoc says:
>>
>> "This function must be called after all mappings that might
>> need to be unmapped have been performed."
>>
>> Trying to infer anything from it beforehand is definitely a bug in the
>> caller.
>
> Well that doesn't really make sense. No matter how many mappings the
> driver has done, there will always be more. ?
But equally the fact that none of the mappings made so far happened to
not need bouncing still doesn't mean that future ones won't. This is not
guaranteed to be a static property of the device, but nor is it really a
property of the *device* at all; it's a property of a set of one or more
DMA mappings with the same lifetime, there's just no suitable generic
notion of that temporal context in the DMA API to carry around and pass
as an explicit argument, so it's left implicit in the usage model.
Whatever higher-level thing it's doing, the driver must have some
context, so within "operation A" it makes some DMA mappings, checks
dma_need_unmap() and sees it's false, so can conclude that "operation A"
does not need to preserve DMA unmap state. However it may then start
"operation B", do some more mappings, check dma_need_unmap() and see
it's now returned true, so "operation B" *does* need to keep the DMA
data and explicitly unmap it when it finishes.
This is essentially the point I made at the time about it not
necessarily being as useful a thing as it seems, since if an "operation"
involves multiple mappings, it must still store the full state of those
mappings for at least long enough to finish them all and then call
dma_need_unmap(), to only then see if it might be OK to throw that state
away again.
Thanks,
Robin.
More information about the Linux-nvme
mailing list