[PATCH 4/6] UBI: Fastmap: Fix races in ubi_wl_get_peb()

Fri Dec 5 05:20:32 PST 2014

Tanya,

Am 05.12.2014 um 14:09 schrieb Tanya Brokhman:
> On 11/24/2014 3:20 PM, Richard Weinberger wrote:
>> ubi_wl_get_peb() has two problems, it reads the pool
>> size and usage counters without any protection.
>> While reading one value would be perfectly fine it reads multiple
>> values and compares them. This is racy and can lead to incorrect
>> pool handling.
>> Furthermore ubi_update_fastmap() is called without wl_lock held,
>> before incrementing the used counter it needs to be checked again.
> 
> I didn't see where you fixed the ubi_update_fastmap issue you just mentioned.

This is exactly what you're questioning below.
We have to recheck as the pool counter could have changed.

>> It could happen that another thread consumed all PEBs from the
>> pool and the counter goes beyond ->size.
>>
>> Signed-off-by: Richard Weinberger <richard at nod.at>
>> ---
>>   drivers/mtd/ubi/ubi.h |  3 ++-
>>   drivers/mtd/ubi/wl.c  | 34 +++++++++++++++++++++++-----------
>>   2 files changed, 25 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
>> index 04c4c05..d672412 100644
>> --- a/drivers/mtd/ubi/ubi.h
>> +++ b/drivers/mtd/ubi/ubi.h
>> @@ -439,7 +439,8 @@ struct ubi_debug_info {
>>    * @pq_head: protection queue head
>>    * @wl_lock: protects the @used, @free, @pq, @pq_head, @lookuptbl, @move_from,
>>    *         @move_to, @move_to_put @erase_pending, @wl_scheduled, @works,
>> - *         @erroneous, @erroneous_peb_count, and @fm_work_scheduled fields
>> + *         @erroneous, @erroneous_peb_count, @fm_work_scheduled, @fm_pool,
>> + *         and @fm_wl_pool fields
>>    * @move_mutex: serializes eraseblock moves
>>    * @work_sem: used to wait for all the scheduled works to finish and prevent
>>    * new works from being submitted
>> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
>> index cb2e571..7730b97 100644
>> --- a/drivers/mtd/ubi/wl.c
>> +++ b/drivers/mtd/ubi/wl.c
>> @@ -629,24 +629,36 @@ void ubi_refill_pools(struct ubi_device *ubi)
>>    */
>>   int ubi_wl_get_peb(struct ubi_device *ubi)
>>   {
>> -    int ret;
>> +    int ret, retried = 0;
>>       struct ubi_fm_pool *pool = &ubi->fm_pool;
>>       struct ubi_fm_pool *wl_pool = &ubi->fm_wl_pool;
>>
>> -    if (!pool->size || !wl_pool->size || pool->used == pool->size ||
>> -        wl_pool->used == wl_pool->size)
>> +again:
>> +    spin_lock(&ubi->wl_lock);
>> +    /* We check here also for the WL pool because at this point we can
>> +     * refill the WL pool synchronous. */
>> +    if (pool->used == pool->size || wl_pool->used == wl_pool->size) {
>> +        spin_unlock(&ubi->wl_lock);
>>           ubi_update_fastmap(ubi);
>> -
>> -    /* we got not a single free PEB */
>> -    if (!pool->size)
>> -        ret = -ENOSPC;
>> -    else {
>>           spin_lock(&ubi->wl_lock);
>> -        ret = pool->pebs[pool->used++];
>> -        prot_queue_add(ubi, ubi->lookuptbl[ret]);
>> +    }
>> +
>> +    if (pool->used == pool->size) {
> 
> Im confused about this "if" condition. You just tested pool->used == pool->size in the previous "if". If in the previous if pool->used != pool->size and wl_pool->used !=
> wl_pool->size, you didn't enter, the lock is still held so pool->used != pool->size still. If in the previos "if" wl_pool->used == wl_pool->size was true nd tou released the lock,
> ubi_update_fastmap(ubi) was called, which refills the pools. So again, if pools were refilled pool->used would be 0 here and pool->size > 0.
> 
> So in both cases I don't see how at this point pool->used == pool->size could ever be true?

If we enter the "if (pool->used == pool->size || wl_pool->used == wl_pool->size) {" branch we unlock wl_lock and call ubi_update_fastmap().
Another thread can enter ubi_wl_get_peb() and alter the pool counter. So we have to recheck the counter after taking wl_lock again.

>>           spin_unlock(&ubi->wl_lock);
>> +        if (retried) {
>> +            ubi_err(ubi, "Unable to get a free PEB from user WL pool");
>> +            ret = -ENOSPC;
>> +            goto out;
>> +        }
>> +        retried = 1;
> 
> Why did you decide to retry in this function? and why only 1 retry attempt? I'm not against it, trying to understand the logic.

Because failing immediately with -ENOSPC is not nice. Before we do that I'll give UBI a second chance to produce a free PEB.

Thanks,
//richard