[LSF/MM/BPF TOPIC] Cloud storage optimizations
Chaitanya Kulkarni
chaitanyak at nvidia.com
Wed Mar 1 19:13:25 PST 2023
(+linux-nvme)
On 2/28/2023 7:52 PM, Theodore Ts'o wrote:
> Emulated block devices offered by cloud VM’s can provide functionality
> to guest kernels and applications that traditionally have not been
> available to users of consumer-grade HDD and SSD’s. For example,
> today it’s possible to create a block device in Google’s Persistent
> Disk with a 16k physical sector size, which promises that aligned 16k
> writes will be atomically. With NVMe, it is possible for a storage
> device to promise this without requiring read-modify-write updates for
> sub-16k writes. All that is necessary are some changes in the block
> layer so that the kernel does not inadvertently tear a write request
> when splitting a bio because it is too large (perhaps because it got
> merged with some other request, and then it gets split at an
> inconvenient boundary).
>
> There are also more interesting, advanced optimizations that might be
> possible. For example, Jens had observed the passing hints that
> journaling writes (either from file systems or databases) could be
> potentially useful. Unfortunately most common storage devices have
> not supported write hints, and support for write hints were ripped out
> last year. That can be easily reversed, but there are some other
> interesting related subjects that are very much suited for LSF/MM.
>
> For example, most cloud storage devices are doing read-ahead to try to
> anticipate read requests from the VM. This can interfere with the
> read-ahead being done by the guest kernel. So being able to tell
> cloud storage device whether a particular read request is stemming
> from a read-ahead or not. At the moment, as Matthew Wilcox has
> pointed out, we currently use the read-ahead code path for synchronous
> buffered reads. So plumbing this information so it can passed through
> multiple levels of the mm, fs, and block layers will probably be
> needed.
>
More information about the Linux-nvme
mailing list