[RFC PATCH] proc: clear_refs: do not clear reserved pages

Hugh Dickins hughd at google.com
Sat Jan 14 12:36:37 EST 2012


On Fri, 13 Jan 2012, Nicolas Pitre wrote:
> On Fri, 13 Jan 2012, Will Deacon wrote:
> 
> > /proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
> > pages and corresponding page table entries of the task with PID pid,
> > which includes any special mappings inserted into the page tables in
> > order to provide things like vDSOs and user helper functions.
> > 
> > On ARM this causes a problem because the vectors page is mapped as a
> > global mapping and since ec706dab ("ARM: add a vma entry for the user
> > accessible vector page"), a VMA is also inserted into each task for this
> > page to aid unwinding through signals and syscall restarts. Since the
> > vectors page is required for handling faults, clearing the YOUNG bit
> > (and subsequently writing a faulting pte) means that we lose the vectors
> > page *globally* and cannot fault it back in. This results in a system
> > deadlock on the next exception.
> > 
> > This patch avoids clearing the aforementioned bits for reserved pages,
> > therefore leaving the vectors page intact on ARM. Since reserved pages
> > are not candidates for swap, this change should not have any impact on
> > the usefulness of clear_refs.
> > 
> > Cc: David Rientjes <rientjes at google.com>
> > Cc: Andrew Morton <akpm at linux-foundation.org>
> > Cc: Nicolas Pitre <nico at fluxnic.net>
> > Reported-by: Moussa Ba <moussaba at micron.com>
> > Signed-off-by: Will Deacon <will.deacon at arm.com>
> 
> Given Andrew's answer, this should be fine wrt Russell's concern.
> 
> Acked-by: Nicolas Pitre <nico at linaro.org>

Yes, it should be okay as an urgent fix for -stable.
But going forward, I doubt it's the right answer: comments below.

> 
> > An aside: if you want to see this problem in action, just run:
> > 
> > $ echo 1 > /proc/self/clear_refs
> > 
> > on an ARM platform (as any user) and watch your system hang. I think this
> > has been the case since 2.6.37, so I'll CC stable once people are happy
> > with the fix.
> > 
> >  fs/proc/task_mmu.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index e418c5a..7dcd2a2 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -518,6 +518,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,

What got me worried was the line just above the context shown below:
    		page = vm_normal_page(vma, addr, ptent);
> >  		if (!page)
> >  			continue;

This is not a normal page, and it's worrying that vm_normal_page() did
not catch it: I wonder how many other places that could be a problem
(but I have not actually identified any).

vm_normal_page() doesn't catch it because at the time it was written,
we thought we were on the point of removing both PageReserved and
VM_RESERVED (both of whose meanings are imprecise), and there was no
need for it to check either of them.  But nobody found time to do the
final (not entirely trivial) cleanup, removing the definitions.

Maybe ec706dab added a need for it to check one of those; though you
can understand my reluctance to spread PageReserved any further than
it goes already.  I was looking for VM_ flags which might serve you
better, when I thought...

This is a horrible hack vma, which is very liable to introduce bugs
of this nature, because not many people are at all aware of it.
But we've had a horrible hack vma for years, the gate_vma (see
mm/memory.c), and that seems to share many characteristics with your
vectors page (most notably, being in kernel not user address space).

Please, going forward, can you delete your vectors page code, and
use the gate_vma for it?  Extending it a little if it somehow does
not satsify your need.  Or else can you please explain (ec706dab
does not) why the gate_vma does not suit you.

I'm not saying the horrible hack gate_vma mechanism is any safer
than yours (the latest bug in it was fixed all of 13 days ago).
But I am saying that one horrible hack is safer than two.

> >  
> > +		if (PageReserved(page))
> > +			continue;

Let's note in passing that this does change the "behaviour" of clear_refs
on the ZERO_PAGE; but it doesn't make any functional difference, we just
need to be aware of it, in case someone tries examining /proc/pid/smaps
after /proc/pid/clear_refs, and complains that some pages are left marked
referenced which were cleared before.  Doesn't make a real difference.

> > +
> >  		/* Clear accessed and referenced bits. */
> >  		ptep_test_and_clear_young(vma, addr, pte);
> >  		ClearPageReferenced(page);
> > -- 
> > 1.7.4.1

Hugh



More information about the linux-arm-kernel mailing list