[PATCH] vfs: Don't evict inode under the inode lru traversing context
Jan Kara
jack at suse.cz
Mon Aug 5 08:30:18 PDT 2024
On Mon 05-08-24 09:34:46, Zhihao Cheng wrote:
> From: Zhihao Cheng <chengzhihao1 at huawei.com>
>
> The inode reclaiming process(See function prune_icache_sb) collects all
> reclaimable inodes and mark them with I_FREEING flag at first, at that
> time, other processes will be stuck if they try getting these inodes
> (See function find_inode_fast), then the reclaiming process destroy the
> inodes by function dispose_list(). Some filesystems(eg. ext4 with
> ea_inode feature, ubifs with xattr) may do inode lookup in the inode
> evicting callback function, if the inode lookup is operated under the
> inode lru traversing context, deadlock problems may happen.
>
> Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
> if ea_inode feature is enabled, the lookup process will be stuck
> under the evicting context like this:
>
> 1. File A has inode i_reg and an ea inode i_ea
> 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
> 3. Then, following three processes running like this:
>
> PA PB
> echo 2 > /proc/sys/vm/drop_caches
> shrink_slab
> prune_dcache_sb
> // i_reg is added into lru, lru->i_ea->i_reg
> prune_icache_sb
> list_lru_walk_one
> inode_lru_isolate
> i_ea->i_state |= I_FREEING // set inode state
> inode_lru_isolate
> __iget(i_reg)
> spin_unlock(&i_reg->i_lock)
> spin_unlock(lru_lock)
> rm file A
> i_reg->nlink = 0
> iput(i_reg) // i_reg->nlink is 0, do evict
> ext4_evict_inode
> ext4_xattr_delete_inode
> ext4_xattr_inode_dec_ref_all
> ext4_xattr_inode_iget
> ext4_iget(i_ea->i_ino)
> iget_locked
> find_inode_fast
> __wait_on_freeing_inode(i_ea) ----→ AA deadlock
> dispose_list // cannot be executed by prune_icache_sb
> wake_up_bit(&i_ea->i_state)
>
> Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
> deleting process holds BASEHD's wbuf->io_mutex while getting the
> xattr inode, which could race with inode reclaiming process(The
> reclaiming process could try locking BASEHD's wbuf->io_mutex in
> inode evicting function), then an ABBA deadlock problem would
> happen as following:
>
> 1. File A has inode ia and a xattr(with inode ixa), regular file B has
> inode ib and a xattr.
> 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
> 3. Then, following three processes running like this:
>
> PA PB PC
> echo 2 > /proc/sys/vm/drop_caches
> shrink_slab
> prune_dcache_sb
> // ib and ia are added into lru, lru->ixa->ib->ia
> prune_icache_sb
> list_lru_walk_one
> inode_lru_isolate
> ixa->i_state |= I_FREEING // set inode state
> inode_lru_isolate
> __iget(ib)
> spin_unlock(&ib->i_lock)
> spin_unlock(lru_lock)
> rm file B
> ib->nlink = 0
> rm file A
> iput(ia)
> ubifs_evict_inode(ia)
> ubifs_jnl_delete_inode(ia)
> ubifs_jnl_write_inode(ia)
> make_reservation(BASEHD) // Lock wbuf->io_mutex
> ubifs_iget(ixa->i_ino)
> iget_locked
> find_inode_fast
> __wait_on_freeing_inode(ixa)
> | iput(ib) // ib->nlink is 0, do evict
> | ubifs_evict_inode
> | ubifs_jnl_delete_inode(ib)
> ↓ ubifs_jnl_write_inode
> ABBA deadlock ←-----make_reservation(BASEHD)
> dispose_list // cannot be executed by prune_icache_sb
> wake_up_bit(&ixa->i_state)
>
> Fix it by forbidding inode evicting under the inode lru traversing
> context. In details, we import a new inode state flag 'I_LRU_ISOLATING'
> to pin inode without holding i_count under the inode lru traversing
> context, the inode evicting process will wait until this flag is
> cleared from i_state.
Thanks for the patch and sorry for not getting to this myself! Let me
rephrase the above paragraph a bit for better readability:
Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING to
pin the inode in memory while inode_lru_isolate() reclaims its pages
instead of using ordinary inode reference. This way inode deletion cannot
be triggered from inode_lru_isolate() thus avoiding the deadlock. evict()
is made to wait for I_LRU_ISOLATING to be cleared before proceeding with
inode cleanup.
> @@ -488,6 +488,36 @@ static void inode_lru_list_del(struct inode *inode)
> this_cpu_dec(nr_unused);
> }
>
> +static void inode_lru_isolating(struct inode *inode)
Perhaps call this inode_pin_lru_isolating()
> +{
> + BUG_ON(inode->i_state & (I_LRU_ISOLATING | I_FREEING | I_WILL_FREE));
> + inode->i_state |= I_LRU_ISOLATING;
> +}
> +
> +static void inode_lru_finish_isolating(struct inode *inode)
And call this inode_unpin_lru_isolating()?
Otherwise the patch looks good so feel free to add:
Reviewed-by: Jan Kara <jack at suse.cz>
Honza
--
Jan Kara <jack at suse.com>
SUSE Labs, CR
More information about the linux-mtd
mailing list