[PATCH v3 09/30] block: Pre-allocate zone write plugs
Damien Le Moal
dlemoal at kernel.org
Wed Mar 27 22:28:40 PDT 2024
On 3/28/24 13:30, Christoph Hellwig wrote:
> I think this should go into the previous patch, splitting it
> out just causes confusion.
>
>> +static void disk_free_zone_wplug(struct blk_zone_wplug *zwplug)
>> +{
>> + struct gendisk *disk = zwplug->disk;
>> + unsigned long flags;
>> +
>> + if (zwplug->flags & BLK_ZONE_WPLUG_NEEDS_FREE) {
>> + kfree(zwplug);
>> + return;
>> + }
>> +
>> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> + list_add_tail(&zwplug->link, &disk->zone_wplugs_free_list);
>> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +}
>> +
>> static bool disk_insert_zone_wplug(struct gendisk *disk,
>> struct blk_zone_wplug *zwplug)
>> {
>> @@ -630,18 +665,24 @@ static struct blk_zone_wplug *disk_get_zone_wplug(struct gendisk *disk,
>> return zwplug;
>> }
>>
>> +static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head)
>> +{
>> + struct blk_zone_wplug *zwplug =
>> + container_of(rcu_head, struct blk_zone_wplug, rcu_head);
>> +
>> + disk_free_zone_wplug(zwplug);
>> +}
>
> Please verify my idea carefully, but I think we can do without the
> RCU grace period and thus the rcu_head in struct blk_zone_wplug:
>
> When the zwplug is removed from the hash, we set the
> BLK_ZONE_WPLUG_UNHASHED flag under disk->zone_wplugs_lock. Once
> caller see that flag any lookup that modifies the structure
> will fail/wait. If we then just clear BLK_ZONE_WPLUG_UNHASHED after
> the final put in disk_put_zone_wplug when we know the bio list is
> empty and no other state is kept (if there might be flags left
> we should clear them before), it is perfectly fine for the
> zwplug to get reused for another zone at this point.
That was my thinking initially as well, which is why I did not have the grace
period. However, getting a reference on a plug is a not done under
disk->zone_wplugs_lock and is thus racy, albeit with a super tiny time window:
the hash table lookup may "see" a plug that has already been removed and has a
refcount dropped to 0 already. The use of atomic_inc_not_zero() prevents us from
trying to keep using that stale plug, but we *are* referencing it. So without
the grace period, I think there is a risk (again, super tiny window) that we
start reusing the plug, or kfree it while atomic_inc_not_zero() is executing...
I am overthinking this ?
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list