[PATCH v6 00/10] block atomic writes
Luis Chamberlain
mcgrof at kernel.org
Thu Apr 11 12:07:40 PDT 2024
On Wed, Apr 10, 2024 at 09:34:36AM +0100, John Garry wrote:
> On 08/04/2024 18:50, Luis Chamberlain wrote:
> > I agree that when you don't set the sector size to 16k you are not forcing the
> > filesystem to use 16k IOs, the metadata can still be 4k. But when you
> > use a 16k sector size, the 16k IOs should be respected by the
> > filesystem.
> >
> > Do we break BIOs to below a min order if the sector size is also set to
> > 16k? I haven't seen that and its unclear when or how that could happen.
>
> AFAICS, the only guarantee is to not split below LBS.
It would be odd to split a BIO given a inode requirement size spelled
out, but indeed I don't recall verifying this gaurantee.
> > At least for NVMe we don't need to yell to a device to inform it we want
> > a 16k IO issued to it to be atomic, if we read that it has the
> > capability for it, it just does it. The IO verificaiton can be done with
> > blkalgn [0].
> >
> > Does SCSI*require* an 16k atomic prep work, or can it be done implicitly?
> > Does it need WRITE_ATOMIC_16?
>
> physical block size is what we can implicitly write atomically.
Yes, and also on flash to avoid read modify writes.
> So if you
> have a 4K PBS and 512B LBS, then WRITE_ATOMIC_16 would be required to write
> 16KB atomically.
Ugh. Why does SCSI requires a special command for this?
Now we know what would be needed to bump the physical block size, it is
certainly a different feature, however I think it would be good to
evaluate that world too. For NVMe we don't have such special write
requirements.
I put together this kludge with the last patches series of LBS + the
bdev cache aops stuff (which as I said before needs an alternative
solution) and just the scsi atomics topology + physical block size
change to easily experiment to see what would break:
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20240408-lbs-scsi-kludge
Using a larger sector size works but it does not use the special scsi
atomic write.
> > > To me, O_ATOMIC would be required for buffered atomic writes IO, as we want
> > > a fixed-sized IO, so that would mean no mixing of atomic and non-atomic IO.
> > Would using the same min and max order for the inode work instead?
>
> Maybe, I would need to check further.
I'd be happy to help review too.
Luis
More information about the Linux-nvme
mailing list