[PATCH] vfs: Don't evict inode under the inode lru traversing context
Zhihao Cheng
chengzhihao1 at huawei.com
Mon Aug 5 20:48:47 PDT 2024
在 2024/8/6 0:10, Mateusz Guzik 写道:
> On Mon, Aug 5, 2024 at 3:24 AM Zhihao Cheng <chengzhihao at huaweicloud.com> wrote:
>>
>> From: Zhihao Cheng <chengzhihao1 at huawei.com>
>>
>> The inode reclaiming process(See function prune_icache_sb) collects all
>> reclaimable inodes and mark them with I_FREEING flag at first, at that
>> time, other processes will be stuck if they try getting these inodes
>> (See function find_inode_fast), then the reclaiming process destroy the
>> inodes by function dispose_list(). Some filesystems(eg. ext4 with
>> ea_inode feature, ubifs with xattr) may do inode lookup in the inode
>> evicting callback function, if the inode lookup is operated under the
>> inode lru traversing context, deadlock problems may happen.
>>
[...]
>
> I only have some tidy-ups with stuff I neglected to mention when
> typing up my proposal.
>
>> ---
>> fs/inode.c | 37 +++++++++++++++++++++++++++++++++++--
>> include/linux/fs.h | 5 +++++
>> 2 files changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index 86670941884b..f1c6e8072f39 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -488,6 +488,36 @@ static void inode_lru_list_del(struct inode *inode)
>> this_cpu_dec(nr_unused);
>> }
>>
>> +static void inode_lru_isolating(struct inode *inode)
>> +{
>
> lockdep_assert_held(&inode->i_lock);
Adopt, will change in v2.
>
>> + BUG_ON(inode->i_state & (I_LRU_ISOLATING | I_FREEING | I_WILL_FREE));
>> + inode->i_state |= I_LRU_ISOLATING;
>> +}
>> +
>> +static void inode_lru_finish_isolating(struct inode *inode)
>> +{
>> + spin_lock(&inode->i_lock);
>> + BUG_ON(!(inode->i_state & I_LRU_ISOLATING));
>> + inode->i_state &= ~I_LRU_ISOLATING;
>> + wake_up_bit(&inode->i_state, __I_LRU_ISOLATING);
>> + spin_unlock(&inode->i_lock);
>> +}
>> +
>> +static void inode_wait_for_lru_isolating(struct inode *inode)
>> +{
>> + DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LRU_ISOLATING);
>> + wait_queue_head_t *wqh;
>> +
>
> Top of evict() asserts on I_FREEING being set, used to decide it's not
> legit to pin above. This dependency can be documented it in the
> routine as well:
>
> BUG_ON(!(inode->i_state & I_FREEING));
Sorry, I don't understand it, do you mean add the
'BUG_ON(!(inode->i_state & I_FREEING))' in
inode_wait_for_lru_isolating()? Just like you talked, evict() already
has one. So far, the inode_wait_for_lru_isolating() is called only in
evict(), we can add the BUG_ON if it is called more than one places in
future.>
>> + spin_lock(&inode->i_lock);
>
> This lock acquire is avoidable, which is always nice to do for
> single-threaded perf. Probably can be also done for the writeback code
> below. Maybe I'll massage it myself after the patch lands.
Yes, maybe inode_wait_for_writeback() and inode_wait_for_lru_isolating()
can be lockless, and a smp_rmb() is needed, I think.
>
>> + wqh = bit_waitqueue(&inode->i_state, __I_LRU_ISOLATING);
>> + while (inode->i_state & I_LRU_ISOLATING) {
>> + spin_unlock(&inode->i_lock);
>> + __wait_on_bit(wqh, &wq, bit_wait, TASK_UNINTERRUPTIBLE);
>> + spin_lock(&inode->i_lock);
>> + }
>> + spin_unlock(&inode->i_lock);
>> +}
>
> So new arrivals *are* blocked by this point thanks to I_FREEING being
> set on entry to evict(). This also means the flag can show up at most
> once.
>
> Thus instead of looping this should merely go to sleep once and assert
> the I_LRU_ISOLATING flag is no longer set after waking up.
Good improvement, will change in v2.
>
>> +
>> /**
>> * inode_sb_list_add - add inode to the superblock list of inodes
>> * @inode: inode to add
>> @@ -657,6 +687,9 @@ static void evict(struct inode *inode)
>>
>> inode_sb_list_del(inode);
>>
>> + /* Wait for LRU isolating to finish. */
>
> I don't think this comment adds anything given the name of the func.
Adopt, will be deleted in v2.
>
>> + inode_wait_for_lru_isolating(inode);
>> +
>> /*
>> * Wait for flusher thread to be done with the inode so that filesystem
>> * does not start destroying it while writeback is still running. Since
>> @@ -855,7 +888,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
>> * be under pressure before the cache inside the highmem zone.
>> */
>> if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
>> - __iget(inode);
>> + inode_lru_isolating(inode);
>> spin_unlock(&inode->i_lock);
>> spin_unlock(lru_lock);
>> if (remove_inode_buffers(inode)) {
>> @@ -867,7 +900,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
>> __count_vm_events(PGINODESTEAL, reap);
>> mm_account_reclaimed_pages(reap);
>> }
>> - iput(inode);
>> + inode_lru_finish_isolating(inode);
>> spin_lock(lru_lock);
>> return LRU_RETRY;
>> }
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index fd34b5755c0b..fb0426f349fc 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2392,6 +2392,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>> *
>> * I_PINNING_FSCACHE_WB Inode is pinning an fscache object for writeback.
>> *
>> + * I_LRU_ISOLATING Inode is pinned being isolated from LRU without holding
>> + * i_count.
>> + *
>> * Q: What is the difference between I_WILL_FREE and I_FREEING?
>> */
>> #define I_DIRTY_SYNC (1 << 0)
>> @@ -2415,6 +2418,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>> #define I_DONTCACHE (1 << 16)
>> #define I_SYNC_QUEUED (1 << 17)
>> #define I_PINNING_NETFS_WB (1 << 18)
>> +#define __I_LRU_ISOLATING 19
>> +#define I_LRU_ISOLATING (1 << __I_LRU_ISOLATING)
>>
>> #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
>> #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
>> --
>> 2.39.2
>>
>
>
More information about the linux-mtd
mailing list