UBI leb_write_unlock NULL pointer Oops (continuation)
Bill Pringlemeir
bpringlemeir at nbsps.com
Mon Feb 24 10:09:56 EST 2014
On 22 Feb 2014, richard at nod.at wrote:
> Am 22.02.2014 01:49, schrieb Bill Pringlemeir:
>> I think the 'ubi_eba_copy_leb()' may have an issue.
>>
>> int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
>> struct ubi_vid_hdr *vid_hdr)
>> {
>> ...
>>
>> err = leb_write_trylock(ubi, vol_id, lnum);
>>
>> static int leb_write_trylock(struct ubi_device *ubi, int vol_id, int lnum)
>> {
>> ..
>> le = ltree_add_entry(ubi, vol_id, lnum); /* users + 1 */
>> if (IS_ERR(le))
>> return PTR_ERR(le);
>> if (down_write_trylock(&le->mutex))
>> return 0;
>>
>> /* Contention, cancel */
>> spin_lock(&ubi->ltree_lock);
>> le->users -= 1; /* user - 1 */
>> ...
>> spin_unlock(&ubi->ltree_lock);
>>
>> return 1;
>> }
>>
>> if (err)...
>>
if (vol->eba_tbl[lnum] != from) {
dbg_wl("LEB %d:%d is no longer mapped to PEB %d, mapped to PEB %d, cancel",
vol_id, lnum, from, vol->eba_tbl[lnum]);
err = MOVE_CANCEL_RACE;
goto out_unlock_leb;
}
...
out_unlock_leb:
leb_write_unlock(ubi, vol_id, lnum); /* user - 1 */
>> Didn't the count have to bump up in this case? This isn't in Thorsten's
>> traces, but neither are any 'down_read' or 'up_read' traces; maybe
>> everything is in cache?
> Hmm, I'm not sure whether I was able to follow your thought. But
> leb_write_unlock() is balanced with leb_write_trylock() in
> ubi_eba_copy_leb() which makes perfectly sense to me. What exactly is
> the problem?
There are two things that must be balanced. The 'reference count'
ubi_ltree_entry -> users and the rw_semaphore down/up. You are right,
the trylock needs to be balanced by the 'leb_write_unlock'. However,
the 'leb_write_trylock()' has already decremented 'users' in preperation
to move the 'lnum'. However, in the case of contention,
'ubi_eba_copy_leb' bails and does the 'leb_write_unlock()', which
balances the 'trylock', but unbalances the 'users' reference count (I
added some comments on the lines).
I was postulating that the down/up paths are not on the same semaphore.
There is nothing in the code to check this. It is possible for someone
to come in and recycle the 'lnum' and create a new rwsemaphore. Here,
the down/up are done on different 'rwsemaphores' and this will cause
issues.
I was looking at using a 'kref' instead of 'users' and then I found this
path.
Fwiw,
Bill Pringlemeir.
More information about the linux-mtd
mailing list