[PATCH v3 15/15] nvme: Ensure atomic writes will be executed atomically

John Garry john.g.garry at oracle.com
Thu Jan 25 03:28:22 PST 2024


On 25/01/2024 00:52, Keith Busch wrote:
> On Wed, Jan 24, 2024 at 11:38:41AM +0000, John Garry wrote:
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 5045c84f2516..6a34a5d92088 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -911,6 +911,32 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>>   	if (req->cmd_flags & REQ_RAHEAD)
>>   		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
>>   
>> +	/*
>> +	 * Ensure that nothing has been sent which cannot be executed
>> +	 * atomically.
>> +	 */
>> +	if (req->cmd_flags & REQ_ATOMIC) {
>> +		struct nvme_ns_head *head = ns->head;
>> +		u32 boundary_bytes = head->atomic_boundary;
>> +
>> +		if (blk_rq_bytes(req) > ns->head->atomic_max)
>> +			return BLK_STS_IOERR;
>> +
>> +		if (boundary_bytes) {
>> +			u32 mask = boundary_bytes - 1, imask = ~mask;
>> +			u32 start = blk_rq_pos(req) << SECTOR_SHIFT;
>> +			u32 end = start + blk_rq_bytes(req);
>> +
>> +			if (blk_rq_bytes(req) > boundary_bytes)
>> +				return BLK_STS_IOERR;
>> +
>> +			if (((start & imask) != (end & imask)) &&
>> +			    (end & mask)) {
>> +				return BLK_STS_IOERR;
>> +			}
>> +		}
>> +	}
> Aren't these new fields, atomic_max and atomic_boundary, duplicates of
> the equivalent queue limits? Let's just use the queue limits instead.

I was making a copy of these limits in the driver out of an abundance of 
caution. I suppose the atomic_max and atomic_boundary values won't be 
modified in the block layer, so I could use them instead.

> 
> And couldn't we generically validate the constraints are not violated in
> submit_bio_noacct() instead of doing that in the low level driver? The
> driver assumes all other requests are already sanity checked, so I don't
> think we should change the responsibility for that just for this flag.

As a safety mechanism, we want to ensure the complete stack is not 
giving us out-of-limits atomic writes.

We have limits checks in XFS iomap and fops.c, but we would also want to 
ensure that the the block layer is not doing anything it shouldn't be 
doing after submit_bio_noacct(), like merging atomic write BIOs which 
straddle a boundary or exceed atomic_max (if there were any merging).

The SCSI standard already has provision for error'ing an atomic write 
command which exceeds the target atomic write capabilities, while NVMe 
doesn't.

BTW, Christoph did mention that he would like to see this:
https://lore.kernel.org/linux-nvme/20231109153603.GA2188@lst.de/

Thanks,
John




More information about the Linux-nvme mailing list