[PATCH 00/10] Read/Write with meta/integrity
Kanchan Joshi
joshi.k at samsung.com
Thu Apr 25 11:39:33 PDT 2024
This adds a new io_uring interface to specify meta along with
read/write. Beyond reading/writing meta, the interface also enables
(a) flags to control data-integrity checks, (b) application tag.
Block path (direct IO) and NVMe driver are modified to support
this.
First 5 patches are enhancements/fixes in the block/nvme so that user meta buffer
(mostly when it gets split) is handled correctly.
Patch 8 adds the io_uring support.
Patch 9 adds the support for block direct IO, and patch 10 for NVMe.
Interface:
Two new opcodes in io_uring: IORING_OP_READ/WRITE_META.
The leftover space in SQE is used to send meta buffer, its length,
apptag, and meta flags (guard/reftag/apptag check for now). Example
program on how to use the interface is appended below [1]
Another design choice will be not to introduce the new opcodes, and add
new RWF_META flag instead. Open to that in next version.
As for new meta flags, RWF_* seemed a bit precious to use. Hence took the route
to carve those within the SQE itself.
Performance:
of non-meta io is not affected due to these patches.
Testing:
has been done by modifying fio to use this interface.
https://github.com/SamsungDS/fio/commits/feat/test-meta-v2
Changes since RFC:
- modify io_uring plumbing based on recent async handling state changes
- fixes/enhancements to correctly handle the split for meta buffer
- add flags to specify guard/reftag/apptag checks
- add support to send apptag
[1]
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/io_uring.h>
#include <linux/types.h>
#include "liburing.h"
/* write data/meta. read both. compare. send apptag too.
* prerequisite:
* unprotected xfer: format namespace with 4KB + 8b, pi_type = 0
* protected xfer: format namespace with 4KB + 8b, pi_type = 1
*/
#define DATA_LEN 4096
#define META_LEN 8
struct t10_pi_tuple {
__be16 guard;
__be16 apptag;
__be32 reftag;
};
int main(int argc, char *argv[])
{
struct io_uring ring;
struct io_uring_sqe *sqe = NULL;
struct io_uring_cqe *cqe = NULL;
void *wdb,*rdb;
char wmb[META_LEN], rmb[META_LEN];
char *data_str = "data buffer";
char *meta_str = "meta";
int fd, ret, blksize;
struct stat fstat;
unsigned long long offset = DATA_LEN;
struct t10_pi_tuple *pi;
if (argc != 2) {
fprintf(stderr, "Usage: %s <block-device>", argv[0]);
return 1;
};
if (stat(argv[1], &fstat) == 0) {
blksize = (int)fstat.st_blksize;
} else {
perror("stat");
return 1;
}
if (posix_memalign(&wdb, blksize, DATA_LEN)) {
perror("posix_memalign failed");
return 1;
}
if (posix_memalign(&rdb, blksize, DATA_LEN)) {
perror("posix_memalign failed");
return 1;
}
strcpy(wdb, data_str);
strcpy(wmb, meta_str);
fd = open(argv[1], O_RDWR | O_DIRECT);
if (fd < 0) {
printf("Error in opening device\n");
return 0;
}
ret = io_uring_queue_init(8, &ring, 0);
if (ret) {
fprintf(stderr, "ring setup failed: %d\n", ret);
return 1;
}
/* write data + meta-buffer to device */
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "get sqe failed\n");
return 1;
}
io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset);
sqe->opcode = IORING_OP_WRITE_META;
sqe->meta_addr = (__u64)wmb;
sqe->meta_len = META_LEN;
/* flags to ask for guard/reftag/apptag*/
sqe->meta_flags = META_CHK_APPTAG;
sqe->apptag = 0x1234;
pi = (struct t10_pi_tuple *)wmb;
pi->apptag = 0x3412;
ret = io_uring_submit(&ring);
if (ret <= 0) {
fprintf(stderr, "sqe submit failed: %d\n", ret);
return 1;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (!cqe) {
fprintf(stderr, "cqe is NULL :%d\n", ret);
return 1;
}
if (cqe->res < 0) {
fprintf(stderr, "write cqe failure: %d", cqe->res);
return 1;
}
io_uring_cqe_seen(&ring, cqe);
/* read data + meta-buffer back from device */
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "get sqe failed\n");
return 1;
}
io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset);
sqe->opcode = IORING_OP_READ_META;
sqe->meta_addr = (__u64)rmb;
sqe->meta_len = META_LEN;
sqe->meta_flags = META_CHK_APPTAG;
sqe->apptag = 0x1234;
ret = io_uring_submit(&ring);
if (ret <= 0) {
fprintf(stderr, "sqe submit failed: %d\n", ret);
return 1;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (!cqe) {
fprintf(stderr, "cqe is NULL :%d\n", ret);
return 1;
}
if (cqe->res < 0) {
fprintf(stderr, "read cqe failure: %d", cqe->res);
return 1;
}
io_uring_cqe_seen(&ring, cqe);
if (strncmp(wmb, rmb, META_LEN))
printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb);
if (strncmp(wdb, rdb, DATA_LEN))
printf("Failure: data mismatch!\n");
io_uring_queue_exit(&ring);
free(rdb);
free(wdb);
return 0;
}
Anuj Gupta (6):
block: set bip_vcnt correctly
block: copy bip_max_vcnt vecs instead of bip_vcnt during clone
block: copy result back to user meta buffer correctly in case of split
block: avoid unpinning/freeing the bio_vec incase of cloned bio
block: modify bio_integrity_map_user argument
io_uring/rw: add support to send meta along with read/write
Kanchan Joshi (4):
block, nvme: modify rq_integrity_vec function
block: define meta io descriptor
block: add support to send meta buffer
nvme: add separate handling for user integrity buffer
block/bio-integrity.c | 69 +++++++++++++++++++++++--------
block/fops.c | 9 +++++
block/t10-pi.c | 6 +++
drivers/nvme/host/core.c | 36 ++++++++++++++++-
drivers/nvme/host/ioctl.c | 11 ++++-
drivers/nvme/host/pci.c | 9 +++--
include/linux/bio.h | 23 +++++++++--
include/linux/blk-integrity.h | 13 +++---
include/linux/fs.h | 1 +
include/uapi/linux/io_uring.h | 15 +++++++
io_uring/io_uring.c | 4 ++
io_uring/opdef.c | 30 ++++++++++++++
io_uring/rw.c | 76 +++++++++++++++++++++++++++++++++--
io_uring/rw.h | 11 ++++-
14 files changed, 276 insertions(+), 37 deletions(-)
base-commit: 24c3fc5c75c5b9d471783b4a4958748243828613
--
2.25.1
More information about the Linux-nvme
mailing list