[PATCH v5 3/5] fcntl: add F_{SET/GET}_RW_HINT_EX

Kanchan Joshi joshi.k at samsung.com
Thu Sep 12 08:53:35 PDT 2024


On 9/12/2024 6:31 PM, Christoph Hellwig wrote:
> On Tue, Sep 10, 2024 at 08:31:58PM +0530, Kanchan Joshi wrote:
>> This is similar to existing F_{SET/GET}_RW_HINT but more
>> generic/extensible.
>>
>> F_SET/GET_RW_HINT_EX take a pointer to a struct rw_hint_ex as argument:
>>
>> struct rw_hint_ex {
>>          __u8    type;
>>          __u8    pad[7];
>>          __u64   val;
>> };
>>
>> With F_SET_RW_HINT_EX, the user passes the hint type and its value.
>> Hint type can be either lifetime hint (TYPE_RW_LIFETIME_HINT) or
>> placement hint (TYPE_RW_PLACEMENT_HINT). The interface allows to add
>> more hint add more hint types in future.
> 
> What is the point of multiplexing these into a single call vs having
> one fcntl for each?  It's not like the code points are a super
> limited resource.

Do you mean new fcntl code only for placement hint?
I thought folks will prefer the user-interface to be future proof so 
that they don't have to add a new fcntl opcode.
Had the existing fcntl accepted "hint type" as argument, I would not 
have resorted to add a new one now.

You may have noticed that in io_uring metadata series also, even though 
current meta type is 'integrity', we allow user interface to express 
other types of metadata too.

> And the _EX name isn't exactly descriptive either and screams of horrible
> Windows APIs :)

I can change to what you prefer.
But my inspiration behind this name was Linux F_GET/SET_OWN_EX (which is 
revamped version of F_GET/SET_OWN).

>> +	WRITE_ONCE(inode->i_write_hint, hint);
>> +	if (file->f_mapping->host != inode)
>> +		WRITE_ONCE(file->f_mapping->host->i_write_hint, hint);
> 
> This doesn't work.  You need a file system method for this so that
> the file system can intercept it, instead of storing it in completely
> arbitrary inodes without any kind of checking for support or intercetion
> point.
> 

I don't understand why will it not work. The hint is being set in the 
same way how it is done in the current code (in existing fcntl handlers 
for temperature hints).

>> --- a/include/linux/rw_hint.h
>> +++ b/include/linux/rw_hint.h
>> @@ -21,4 +21,17 @@ enum rw_lifetime_hint {
>>   static_assert(sizeof(enum rw_lifetime_hint) == 1);
>>   #endif
>>   
>> +#define WRITE_HINT_TYPE_BIT	BIT(7)
>> +#define WRITE_HINT_VAL_MASK	(WRITE_HINT_TYPE_BIT - 1)
>> +#define WRITE_HINT_TYPE(h)	(((h) & WRITE_HINT_TYPE_BIT) ? \
>> +				TYPE_RW_PLACEMENT_HINT : TYPE_RW_LIFETIME_HINT)
>> +#define WRITE_HINT_VAL(h)	((h) & WRITE_HINT_VAL_MASK)
>> +
>> +#define WRITE_PLACEMENT_HINT(h)	(((h) & WRITE_HINT_TYPE_BIT) ? \
>> +				 WRITE_HINT_VAL(h) : 0)
>> +#define WRITE_LIFETIME_HINT(h)	(((h) & WRITE_HINT_TYPE_BIT) ? \
>> +				 0 : WRITE_HINT_VAL(h))
>> +
>> +#define PLACEMENT_HINT_TYPE	WRITE_HINT_TYPE_BIT
>> +#define MAX_PLACEMENT_HINT_VAL	(WRITE_HINT_VAL_MASK - 1)
> 
> That's a whole lot of undocumented macros.  Please turn these into proper
> inline functions and write documentation for them.

I can try doing that.



More information about the Linux-nvme mailing list