[PATCH v8 10/10] nvme: Atomic write support

John Garry john.g.garry at oracle.com
Mon Jun 17 11:04:23 PDT 2024


On 17/06/2024 18:24, Kanchan Joshi wrote:
> On 6/10/2024 4:13 PM, John Garry wrote:
>> +static bool nvme_valid_atomic_write(struct request *req)
>> +{
>> +	struct request_queue *q = req->q;
>> +	u32 boundary_bytes = queue_atomic_write_boundary_bytes(q);
>> +
>> +	if (blk_rq_bytes(req) > queue_atomic_write_unit_max_bytes(q))
>> +		return false;
>> +
>> +	if (boundary_bytes) {
>> +		u64 mask = boundary_bytes - 1, imask = ~mask;
>> +		u64 start = blk_rq_pos(req) << SECTOR_SHIFT;
>> +		u64 end = start + blk_rq_bytes(req) - 1;
>> +
>> +		/* If greater then must be crossing a boundary */
>> +		if (blk_rq_bytes(req) > boundary_bytes)
>> +			return false;
> 
> Nit: I'd cache blk_rq_bytes(req), since that is repeating and this
> function is called for each atomic IO.

blk_rq_bytes() is just a wrapper for rq->__data_len. I suppose that we 
could cache that value to stop re-reading that memory, but I would 
hope/expect that memory to be in the CPU cache anyway.

> 
>> +
>> +		if ((start & imask) != (end & imask))
>> +			return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>    static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>>    		struct request *req, struct nvme_command *cmnd,
>>    		enum nvme_opcode op)
>> @@ -941,6 +965,12 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>>    
>>    	if (req->cmd_flags & REQ_RAHEAD)
>>    		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
>> +	/*
>> +	 * Ensure that nothing has been sent which cannot be executed
>> +	 * atomically.
>> +	 */
>> +	if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
>> +		return BLK_STS_INVAL;
>>    
> 
> Is this validity check specific to NVMe or should this be moved up to
> block layer as it also knows the limits?

Only NVMe supports an LBA space boundary, so that part is specific to NVMe.

Regardless, the block layer already should ensure that the atomic write 
length and boundary is respected. nvme_valid_atomic_write() is just an 
insurance policy against the block layer or some other component not 
doing its job.

For SCSI, the device would error - for example - if the atomic write 
length was larger than the device supported. NVMe silently just does not 
execute the write atomically in that scenario, which we must avoid.

Thanks,
John




More information about the Linux-nvme mailing list