[PATCH v3 08/30] block: Introduce zone write plugging
Damien Le Moal
dlemoal at kernel.org
Thu Mar 28 15:38:37 PDT 2024
On 3/29/24 07:20, Bart Van Assche wrote:
>> +static void disk_zone_wplugs_work(struct work_struct *work)
>> +{
>> + struct gendisk *disk =
>> + container_of(work, struct gendisk, zone_wplugs_work);
>> + struct blk_zone_wplug *zwplug;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> +
>> + while (!list_empty(&disk->zone_wplugs_err_list)) {
>> + zwplug = list_first_entry(&disk->zone_wplugs_err_list,
>> + struct blk_zone_wplug, link);
>> + list_del_init(&zwplug->link);
>> + blk_get_zone_wplug(zwplug);
>> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +
>> + disk_zone_wplug_handle_error(disk, zwplug);
>> + disk_put_zone_wplug(zwplug);
>> +
>> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> + }
>> +
>> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +}
>
> What is the maximum number of iterations the above while loop can
> perform? I'm wondering whether a cond_resched() call should be added.
The loop will go on as long as there are plugged BIOs and they can be merged,
that is, as long as the request does not exceed the queue limits. So this is all
limited naturally by the queue limits.
>> + /* Wait for the zone write plugs to be RCU-freed. */
>> + rcu_barrier();
>> +}
>
> It is not clear to me why the above rcu_barrier() call is necessary? I'm
> not aware of any other kernel code where kfree_rcu() is followed by an
> rcu_barrier() call.
Right after that, the mempool (in v4, free list here) is destroyed. So the
rcu_barrier() is needed to ensure that the grace period is past and that all
plugs are back in the pool/freelist. Without this, I saw problems/crashes when
removing devices.
>> +static int disk_alloc_zone_resources(struct gendisk *disk,
>> + unsigned int max_nr_zwplugs)
>> +{
>> + unsigned int i;
>> +
>> + disk->zone_wplugs_hash_bits =
>> + min(ilog2(max_nr_zwplugs) + 1, BLK_ZONE_MAX_WPLUG_HASH_BITS);
>
> If max_nr_zwplugs is a power of two, the above formula will result in a
> hash table with a size that is twice the size of max_nr_zwplugs.
Yes, that is in purpose, to avoid hash collisions as much as possible.
> Shouldn't ilog2(max_nr_zwplugs) + 1 be changed into
> ilog2(roundup_pow_of_two(max_nr_zwplugs))?
I think it should be:
ilog2(roundup_pow_of_two(max_nr_zwplugs)) + 1
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list