[PATCH v3 09/30] block: Pre-allocate zone write plugs

Damien Le Moal dlemoal at kernel.org
Wed Mar 27 23:02:54 PDT 2024


On 3/28/24 14:46, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 02:28:40PM +0900, Damien Le Moal wrote:
>> That was my thinking initially as well, which is why I did not have the
>> grace period. However, getting a reference on a plug is a not done under
>> disk->zone_wplugs_lock and is thus racy, albeit with a super tiny time
>> window: the hash table lookup may "see" a plug that has already been
>> removed and has a refcount dropped to 0 already. The use of
>> atomic_inc_not_zero() prevents us from trying to keep using that stale
>> plug, but we *are* referencing it. So without the grace period, I think
>> there is a risk (again, super tiny window) that we start reusing the
>> plug, or kfree it while atomic_inc_not_zero() is executing...
>> I am overthinking this ?
> 
> Well.  All the lookups fail (or should fail) when BLK_ZONE_WPLUG_UNHASHED
> is set, probably even before even trying to grab a reference.  So all> the lookups for a zone that is beeing torn down will fail.  Now once
> the actual final reference is dropped, we'll now need to clear
> BLK_ZONE_WPLUG_UNHASHED and lookup can happe again.  We'd have a race
> window there, but I guess we can plug it by checking for the right
> zone number?  If we it while it already got reduce that'll still fail
> the lookup.

But that is the problem: "checking the zone number again" means referencing the
plug struct again from the lookup context while the last ref drop context is
freeing the plug. That race can be lost by the lookup context and lead to
referencing freed memory. So your solution would be OK for pre-allocated plugs
only. For kmalloc-ed() plugs, we still need the rcu grace period for free. So we
can only optimize for the pre-allocated plugs...

-- 
Damien Le Moal
Western Digital Research




More information about the Linux-nvme mailing list