[PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support

Anuj Gupta anuj20.g at samsung.com
Thu Nov 21 00:59:35 PST 2024


On Mon, Nov 18, 2024 at 04:59:22PM +0000, Pavel Begunkov wrote:
> On 11/18/24 12:50, Christoph Hellwig wrote:
> > On Sat, Nov 16, 2024 at 12:32:25AM +0000, Pavel Begunkov wrote:
> > > We can also reuse your idea from your previous iterations and
> > > use the bitmap to list all attributes. Then preamble and
> > > the explicit attr_type field are not needed, type checking
> > > in the loop is removed and packing is better. And just
> > > by looking at the map we can calculate the size of the
> > > array and remove all size checks in the loop.
> > 
> > Can we please stop overdesigning the f**k out of this?  Really,
> 
> Please stop it, it doesn't add weight to your argument. The design
> requirement has never changed, at least not during this patchset
> iterations.
> 
> > either we're fine using the space in the extended SQE, or
> > we're fine using a separate opcode, or if we really have to just
> > make it uring_cmd.  But stop making thing being extensible for
> > the sake of being extensible.
> 
> It's asked to be extendible because there is a good chance it'll need to
> be extended, and no, I'm not suggesting anyone to implement the entire
> thing, only PI bits is fine.
> 
> And no, it doesn't have to be "this or that" while there are other
> options suggested for consideration. And the problem with the SQE128
> option is not even about SQE128 but how it's placed inside, i.e.
> at a fixed spot.
> 
> Do we have technical arguments against the direction in the last
> suggestion? It's extendible and _very_ simple. The entire (generic)
> handling for the bitmask approach for this set would be sth like:
> 
> struct sqe {
> 	u64 attr_type_mask;
> 	u64 attr_ptr;
> };
> if (sqe->attr_type_mask) {
> 	if (sqe->attr_type_mask != TYPE_PI)
> 		return -EINVAL;
> 
> 	struct uapi_pi_structure pi;
> 	copy_from_user(&pi, sqe->attr_ptr, sizeof(pi));
> 	hanlde_pi(&pi);
> }
> 
> And the user side:
> 
> struct uapi_pi_structure pi = { ... };
> sqe->attr_ptr = π
> sqe->attr_type_mask = TYPE_PI;
>

How about using this, but also have the ability to keep PI inline. 
Attributes added down the line can take one of these options:
1. If available space in SQE/SQE128 is sufficient for keeping attribute
fields, one can choose to keep them inline and introduce a TYPE_XYZ_INLINE
attribute flag.
2. If the available space is not sufficient, pass the attribute information
as pointer and introduce a TYPE_XYZ attribute flag.
3. One can choose to support a attribute via both pointer and inline scheme.
The pointer scheme can help with scenarios where user wants to avoid SQE128
for whatever reasons (e.g. mixed workload).

Something like this:

// uapi/.../io_uring.h

struct sqe {
	...
	u64 attr_type_mask;
	u64 attr_ptr;
};

// kernel space

if (sqe->attr_type_mask) {
	struct uapi_pi_struct pi, *piptr;

	if ((sqe->attr_type_mask & TYPE_PI_INLINE) &&
	    (sqe->attr_type_mask & TYPE_PI))
		return -EINVAL;
	/* inline PI case */
	if (sqe->attr_type_mask & TYPE_PI_INLINE) {
		if (!(ctx->flags & IORING_SETUP_SQE128))
			return -EINVAL;
		piptr = (sqe + 1);
	} else if (sqe->attr_type_mask & TYPE_PI) {
	/* indirect PI case */
		copy_from_user(&pi, sqe->attr_ptr, sizeof(pi));
		piptr = π
	}

	handle_pi(piptr);
	
}

And the user side:

// send PI using pointer:

struct uapi_pi_structure pi = { ... };
sqe->attr_ptr = π
sqe->attr_type_mask = TYPE_PI;

// send PI inline:

/* setup a big-sqe ring */
struct uapi_pi_structure *pi = (sqe+1);
prepare_pi(pi);
sqe->attr_type_mask = TYPE_PI_INLINE;



More information about the Linux-nvme mailing list