[PATCH v2 07/19] mm/fork: Accept huge pfnmap entries

Yan Zhao yan.y.zhao at intel.com
Mon Sep 9 19:52:01 PDT 2024


On Mon, Sep 09, 2024 at 08:08:16PM -0400, Peter Xu wrote:
> On Mon, Sep 09, 2024 at 04:15:39PM -0700, Andrew Morton wrote:
> > On Mon, 9 Sep 2024 18:43:22 -0400 Peter Xu <peterx at redhat.com> wrote:
> > 
> > > > > > Do we need the logic to clear dirty bit in the child as that in
> > > > > > __copy_present_ptes()?  (and also for the pmd's case).
> > > > > > 
> > > > > > e.g.
> > > > > > if (vma->vm_flags & VM_SHARED)
> > > > > > 	pud = pud_mkclean(pud);
> > > > > 
> > > > > Yeah, good question.  I remember I thought about that when initially
> > > > > working on these lines, but I forgot the details, or maybe I simply tried
> > > > > to stick with the current code base, as the dirty bit used to be kept even
> > > > > in the child here.
> > > > > 
> > > > > I'd expect there's only performance differences, but still sounds like I'd
> > > > > better leave that to whoever knows the best on the implications, then draft
> > > > > it as a separate patch but only when needed.
> > > > 
> > > > Sorry, but this vaguensss simply leaves me with nowhere to go.
> > > > 
> > > > I'll drop the series - let's revisit after -rc1 please.
> > > 
> > > Andrew, would you please explain why it needs to be dropped?
> > > 
> > > I meant in the reply that I think we should leave that as is, and I think
> > > so far nobody in real life should care much on this bit, so I think it's
> > > fine to leave the dirty bit as-is.
> > > 
> > > I still think whoever has a better use of the dirty bit and would like to
> > > change the behavior should find the use case and work on top, but only if
> > > necessary.
> > 
> > Well.  "I'd expect there's only performance differences" means to me
> > "there might be correctness issues, I don't know".  Is it or is it not
> > merely a performance thing?
> 
> There should have no correctness issue pending.  It can only be about
> performance, and AFAIU what this patch does is exactly the way where it
> shouldn't ever change performance either, as it didn't change how dirty bit
> was processed (just like before this patch), not to mention correctness (in
> regards to dirty bits).
> 
> I can provide some more details.
> 
> Here the question we're discussing is "whether we should clear the dirty
> bit in the child for a pgtable entry when it's VM_SHARED".  Yan observed
> that we don't do the same thing for pte/pmd/pud, which is true.
> 
> Before this patch:
> 
>   - For pte:     we clear dirty bit if VM_SHARED in child when copy
>   - For pmd/pud: we never clear dirty bit in the child when copy
Hi Peter,

Not sure if I missed anything.

It looks that before this patch, pmd/pud are alawys write protected without
checking "is_cow_mapping(vma->vm_flags) && pud_write(pud)". pud_wrprotect()
clears dirty bit by moving the dirty value to the software bit.

And I have a question that why previously pmd/pud are always write protected.

Thanks
Yan

> 
> The behavior of clearing dirty bit for VM_SHARED in child for pte level
> originates to the 1st commit that git history starts.  So we always do so
> for 19 years.
> 
> That makes sense to me, because clearing dirty bit in pte normally requires
> a SetDirty on the folio, e.g. in unmap path:
> 
>         if (pte_dirty(pteval))
>                 folio_mark_dirty(folio);
> 
> Hence cleared dirty bit in the child should avoid some extra overheads when
> the pte maps a file cache, so clean pte can at least help us to avoid calls
> into e.g. mapping's dirty_folio() functions (in which it should normally
> check folio_test_set_dirty() again anyway, and parent pte still have the
> dirty bit set so we won't miss setting folio dirty):
> 
> folio_mark_dirty():
>         if (folio_test_reclaim(folio))
>                 folio_clear_reclaim(folio);
>         return mapping->a_ops->dirty_folio(mapping, folio);
> 
> However there's the other side of thing where when the dirty bit is missing
> I _think_ it also means when the child writes to the cleaned pte, it'll
> require (e.g. on hardware accelerated archs) MMU setting dirty bit which is
> slower than if we don't clear the dirty bit... and on software emulated
> dirty bits it could even require a page fault, IIUC.
> 
> In short, personally I don't know what's the best to do, on keep / remove
> the dirty bit even if it's safe either way: there are pros and cons on
> different decisions.
> 
> That's why I said I'm not sure which is the best way.  I had a feeling that
> most of the people didn't even notice this, and we kept running this code
> for the past 19 years just all fine..
> 
> OTOH, we don't do the same for pmds/puds (in which case we persist dirty
> bits always in child), and I didn't check whether it's intended, or why.
> It'll have similar reasoning as above discussion on pte, or even more I
> overlooked.
> 
> So again, the safest approach here is in terms of dirty bit we keep what we
> do as before.  And that's what this patch does as of now.
> 
> IOW, if I'll need a repost, I'll repost exactly the same thing (with the
> fixup I sent later, which is already in mm-unstable).
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
> 



More information about the linux-arm-kernel mailing list