[PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Uladzislau Rezki
urezki at gmail.com
Tue Apr 5 13:28:54 PDT 2022
On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov at fb.com>
>
> Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
> vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
> lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
> purges the vmap areas instead of doing it lazily.
>
> Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
> context") moved the purging from the vunmap() caller to a worker thread.
> Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
> (possibly forever). For example, consider the following scenario:
>
> 1. Thread reads from /proc/vmcore. This eventually calls
> __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
> vmap_lazy_nr to lazy_max_pages() + 1.
> 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
> pages (one page plus the guard page) to the purge list and
> vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
> drain_vmap_work is scheduled.
> 3. Thread returns from the kernel and is scheduled out.
> 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
> frees the 2 pages on the purge list. vmap_lazy_nr is now
> lazy_max_pages() + 1.
> 5. This is still over the threshold, so it tries to purge areas again,
> but doesn't find anything.
> 6. Repeat 5.
>
> If the system is running with only one CPU (which is typicial for kdump)
> and preemption is disabled, then this will never make forward progress:
> there aren't any more pages to purge, so it hangs. If there is more than
> one CPU or preemption is enabled, then the worker thread will spin
> forever in the background. (Note that if there were already pages to be
> purged at the time that set_iounmap_nonlazy() was called, this bug is
> avoided.)
>
> This can be reproduced with anything that reads from /proc/vmcore
> multiple times. E.g., vmcore-dmesg /proc/vmcore.
>
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.
>
> Signed-off-by: Omar Sandoval <osandov at fb.com>
> ---
> arch/x86/include/asm/io.h | 2 +-
> arch/x86/kernel/crash_dump_64.c | 2 +-
> mm/vmalloc.c | 21 ++++++++++-----------
> 3 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index f6d91ecb8026..da466352f27c 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
> extern void iounmap(volatile void __iomem *addr);
> #define iounmap iounmap
>
> -extern void set_iounmap_nonlazy(void);
> +void iounmap_purge_vmap_area(void);
>
> #ifdef __KERNEL__
>
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index a7f617a3981d..075dd36c502d 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
> } else
> memcpy(buf, vaddr + offset, csize);
>
> - set_iounmap_nonlazy();
> iounmap((void __iomem *)vaddr);
> + iounmap_purge_vmap_area();
> return csize;
> }
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e163372d3967..48084d742688 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
> /* for per-CPU blocks */
> static void purge_fragmented_blocks_allcpus(void);
>
> -#ifdef CONFIG_X86_64
> -/*
> - * called before a call to iounmap() if the caller wants vm_area_struct's
> - * immediately freed.
> - */
> -void set_iounmap_nonlazy(void)
> -{
> - atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
> -}
> -#endif /* CONFIG_X86_64 */
> -
> /*
> * Purges all lazily-freed vmap areas.
> */
> @@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void)
> mutex_unlock(&vmap_purge_lock);
> }
>
> +#ifdef CONFIG_X86_64
> +/* Called after iounmap() to immediately free vm_area_struct's. */
> +void iounmap_purge_vmap_area(void)
> +{
> + mutex_lock(&vmap_purge_lock);
> + __purge_vmap_area_lazy(ULONG_MAX, 0);
> + mutex_unlock(&vmap_purge_lock);
> +}
> +#endif /* CONFIG_X86_64 */
> +
> static void drain_vmap_area_work(struct work_struct *work)
> {
> unsigned long nr_lazy;
> --
> 2.35.1
>
Indeed setting the vmap_lazy_nr to lazy_max_pages()+1 in order to
force the drain sounds like a hack.
Reviewed-by: Uladzislau Rezki (Sony) <urezki at gmail.com>
--
Uladzislau Rezki
More information about the kexec
mailing list