[RFC v2 07/11] nvme-pci: implement dma_token backed requests

Christoph Hellwig hch at infradead.org
Thu Dec 4 03:04:08 PST 2025


> +static void nvme_sync_dma(struct nvme_dev *nvme_dev, struct request *req,
> +			  enum dma_data_direction dir)
> +{
> +	struct blk_mq_dma_map *map = req->dma_map;
> +	int length = blk_rq_payload_bytes(req);
> +	bool for_cpu = dir == DMA_FROM_DEVICE;
> +	struct device *dev = nvme_dev->dev;
> +	dma_addr_t *dma_list = map->private;
> +	struct bio *bio = req->bio;
> +	int offset, map_idx;
> +
> +	offset = bio->bi_iter.bi_bvec_done;
> +	map_idx = offset / NVME_CTRL_PAGE_SIZE;
> +	length += offset & (NVME_CTRL_PAGE_SIZE - 1);
> +
> +	while (length > 0) {
> +		u64 dma_addr = dma_list[map_idx++];
> +
> +		if (for_cpu)
> +			__dma_sync_single_for_cpu(dev, dma_addr,
> +						  NVME_CTRL_PAGE_SIZE, dir);
> +		else
> +			__dma_sync_single_for_device(dev, dma_addr,
> +						     NVME_CTRL_PAGE_SIZE, dir);
> +		length -= NVME_CTRL_PAGE_SIZE;
> +	}

This looks really inefficient.  Usually the ranges in the dmabuf should
be much larger than a controller page.


> +static void nvme_unmap_premapped_data(struct nvme_dev *dev,
> +				      struct request *req)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	if (rq_data_dir(req) == READ)
> +		nvme_sync_dma(dev, req, DMA_FROM_DEVICE);
> +	if (!(iod->flags & IOD_SINGLE_SEGMENT))
> +		nvme_free_descriptors(req);
> +}

This doesn't really unmap anything :)

Also the dma ownership rules say that you always need to call the
sync_to_device helpers before I/O and the sync_to_cpu helpers after I/O,
no matters if it is a read or write.  The implementations then makes
them a no-op where possible.

> +
> +	offset = bio->bi_iter.bi_bvec_done;
> +	map_idx = offset / NVME_CTRL_PAGE_SIZE;
> +	offset &= (NVME_CTRL_PAGE_SIZE - 1);
> +
> +	prp1_dma = dma_list[map_idx++] + offset;
> +
> +	length -= (NVME_CTRL_PAGE_SIZE - offset);
> +	if (length <= 0) {
> +		prp2_dma = 0;

Urgg, why is this building PRPs instead of SGLs?  Yes, SGLs are an
optional feature, but for devices where you want to micro-optimize
like this I think we should simply require them.  This should cut
down on both the memory use and the amount of special mapping code.




More information about the Linux-nvme mailing list