[PATCH v4 07/28] block: Introduce zone write plugging

Jens Axboe axboe at kernel.dk
Tue Apr 2 18:01:21 PDT 2024


On 4/2/24 5:38 PM, Damien Le Moal wrote:
> On 4/3/24 01:12, Christoph Hellwig wrote:
>>> +static inline struct blk_zone_wplug *
>>> +disk_lookup_zone_wplug(struct gendisk *disk, sector_t sector)
>>> +{
>>> +	unsigned int zno = disk_zone_no(disk, sector);
>>> +	unsigned int idx = hash_32(zno, disk->zone_wplugs_hash_bits);
>>> +	struct blk_zone_wplug *zwplug;
>>> +
>>> +	rcu_read_lock();
>>> +	hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[idx], node) {
>>> +		if (zwplug->zone_no == zno)
>>> +			goto unlock;
>>> +	}
>>> +	zwplug = NULL;
>>> +
>>> +unlock:
>>> +	rcu_read_unlock();
>>> +	return zwplug;
>>> +}
>>
>> Did we lose an atomic_inc_unless_zero here?  This now just does a lookup
>> under RCU, but nothing to prevent the zwplug from beeing freed?
> 
> Nope. When disk_lookup_zone_wplug() is called directly, it is always
> for handling requests/bios which are holding a reference on the plug
> and because there are requests/BIOs in-flight, the plug is marked as
> busy (BLK_ZONE_WPLUG_PLUGGED or BLK_ZONE_WPLUG_ERROR are set). In such
> state, the plug is always hashed given that
> disk_should_remove_zone_wplug() retturns false for busy plugs. So
> there is no reference increase here. The atomic_inc_not_zero() is in
> disk_get_zone_wplug() which calls disk_lookup_zone_wplug() +
> atomic_inc_not_zero() within an rcu_read_lock()/rcu_read_unlock()
> section.

But doing a lookup under rcu, dropping it, and then returning the unit
without an increment is just a horrible pattern. Regardless of whether
it's safe or not. And as most callers do the atomic_inc anyway, some of
then outside rcu lock, this looks horrible buggy.

Please just unify it all and have it follow the idiomatic rcu lookup
pattern, which is:

rcu_read_lock();
item = lookup();
if (atomic_inc_not_zero(item->ref)) {
	rcu_read_unlock();
	return item;
}

rcu_read_unlock();
return NULL;

as that is well understood, and your code most certainly does not look
safe nor sane in that regard.

And probably kill that atomic_inc() helper you have as well while at it.

-- 
Jens Axboe




More information about the Linux-nvme mailing list