[PATCH] NVMe: avoid kmalloc/kfree for smaller IO

Jens Axboe axboe at fb.com
Thu Jan 22 10:59:01 PST 2015


On 01/22/2015 10:26 AM, Keith Busch wrote:
> On Wed, 21 Jan 2015, Jens Axboe wrote:
>> Currently we allocate an nvme_iod for each IO, which holds the
>> sg list, prps, and other IO related info. Set a threshold of
>> 2 pages and/or 8KB of data, below which we can just embed this
>> in the per-command pdu in blk-mq. For any IO at or below
>> NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree.
>>
>> For higher IOPS, this saves up to 1% of CPU time.
>>
>> Signed-off-by: Jens Axboe <axboe at fb.com>
>>
>> ----
> 
>> +/*
>> + * Max size of iod being embedded in the request payload
>> + */
>> +#define NVME_INT_PAGES        2
>> +#define NVME_INT_BYTES        (NVME_INT_PAGES * PAGE_CACHE_SIZE)
> 
> I think the above needs to use what the device thinks a page size,
> right? If
> there's a mismatched host-device page size, nvme_setup_prps could end up
> accessing a non-existent prp list.
> 
>   #define NVME_INT_BYTES(dev) (NVME_INT_PAGES * dev->page_size)

Good point, I missed that aspect of it. I'll make that change and repost.

-- 
Jens Axboe




More information about the Linux-nvme mailing list