[PATCH 2/7] Add various hugetlb page table fix

Catalin Marinas catalin.marinas at arm.com
Tue Feb 7 09:11:00 EST 2012


On Tue, Feb 07, 2012 at 01:24:09PM +0000, carson bill wrote:
> 2012/2/7, Catalin Marinas <catalin.marinas at arm.com>:
> > On Tue, Feb 07, 2012 at 01:42:01AM +0000, bill4carson wrote:
> >> On 2012年02月07日 00:26, Catalin Marinas wrote:
> >> > On Wed, Feb 01, 2012 at 03:10:21AM +0000, bill4carson wrote:
> >> >> Why L_PTE_HUGEPAGE is needed?
> >> >>
> >> >> hugetlb subsystem will call pte_page to derive the corresponding page
> >> >> struct from a given pte, and pte_pfn is used first to convert pte into
> >> >> a page frame number.
> >> >
> >> > Are you sure the pte_pfn() conversion is right? Does it need to be
> >> > different from the 4K pfn?
> > ...
> >> pte_page is defined as following to derive page struct from a given pte.
> >> This macro is used both in generic mm as well as hugetlb sub-system, so
> >> we need do the switch in pte_pfn to mark huge page based linux pte out
> >> of normal page based linux pte, that's what L_PTE_HUGEPAGE for.
> >>
> >> #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
> >>
> >> So L_PTE_HUGEPAGE is *NOT* set in normal page based linux pte,
> >> linux pte bits[31:12] is the page frame number;
> >
> > I agree.
> >
> >> otherwise, we got a huge page based linux pte, and linux pte
> >> bits[31:20] is page frame number for SECTION mapping, and bits[31:24]
> >> is page frame number for SUPER-SECTION mapping.
> >
> > Actually it is still 31:12 but with bits 19:12 or 23:12 masked out. So
> > you do the correct shift by PAGE_SHIFT with the additional masking for
> > huge pages (harmless).
> >
> > But do we actually need this masking? Do the huge_pte_offset() or
> > huge_pte_alloc() functions return the Linux pte (pmd) for the huge page?
> > If yes, can we not ensure that bits 19:12 are already zero? This
> > shouldn't be any different from the 4K Linux pte but with an address
> > aligned to 1MB.
> 
> I'm afraid there is some misunderstanding.
> huge_pte_offset() returns the huge linux pte address if they exist;
> huge_pte_alloc()  allocates a location to store huge linux pte, and
> return this address;
> non of above functions return huge linux pte *value*.

I agree, huge_pte_offset() returns a pointer to the Linux pte/pmd if it
exists. My point is that the values stored in Linux pte/pmd have bits
20:12 cleared already as the address is at least 2MB aligned (well,
apart from the additional L_PTE_HPAGE_* bits that you declared). Is this
correct? If yes, then you don't need any additional masking for
pte_pfn() even if it is passed a Linux pmd.

> make_huge_pte() will return huge linux pte for a given page and vma
> protection bits,
> please notice pte_mkhuge is used to mark this pte as huge linux pte by setting
> L_PTE_HUGEPAGE, then set_huge_pte_at() is used to set huge linux pte as well
> huge hardware pte.
> 
> 
> 2113static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
> 2114                                int writable)
> 2115{
> 2116        pte_t entry;
> 2117
> 2118        if (writable) {
> 2119                entry =
> 2120                    pte_mkwrite(pte_mkdirty(mk_pte(page,
> vma->vm_page_prot)));
> 2121        } else {
> 2122                entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot));
> 2123        }
> 2124        entry = pte_mkyoung(entry);
> 2125        entry = pte_mkhuge(entry);
> 2126
> 2127        return entry;
> 2128}
> 
> Hence, normal linux pte must has L_PTE_HUGEPAE cleared;
> A huge linux pte must has L_PTE_HUGEPAGE(BIT11) set
> This could lead to L_PTE_HPAGE_2M(BIT12) or L_PTE_HPAGE_16M(BIT13) set
> respectively, that's why the masking is needed for pte_pfn.

But if you avoid setting L_PTE_HPAGE_*, than we don't need the masking
for pte_pfn. In which case, we don't need to differentiate between a
normal and a huge pte in pte_pfn(), so no need for L_PTE_HUGEPAGE. The
set_huge_pte_at() function is only called with a huge pte, so it doesn't
need to check the L_PTE_HUGEPAGE bit either.

-- 
Catalin



More information about the linux-arm-kernel mailing list