[PATCH RFC v3 2/4] mm/pgtable: Make pfn_pte() filter out huge page attributes
Jürgen Groß
jgross at suse.com
Thu Mar 5 02:05:50 PST 2026
On 05.03.26 10:38, Yin Tirui wrote:
>
> On 3/4/2026 3:52 PM, Jürgen Groß wrote:
>> On 28.02.26 08:09, Yin Tirui wrote:
>>> A fundamental principle of page table type safety is that `pte_t` represents
>>> the lowest level page table entry and should never carry huge page attributes.
>>>
>>> Currently, passing a pgprot with huge page bits (e.g., extracted via
>>> pmd_pgprot()) into pfn_pte() creates a malformed PTE that retains the huge
>>> attribute, leading to the necessity of the ugly `pte_clrhuge()` anti- pattern.
>>>
>>> Enforce type safety by making `pfn_pte()` inherently filter out huge page
>>> attributes:
>>> - On x86: Strip the `_PAGE_PSE` bit.
>>> - On ARM64: Mask out the block descriptor bits in `PTE_TYPE_MASK` and
>>> enforce the `PTE_TYPE_PAGE` format.
>>> - On RISC-V: No changes required, as RISC-V leaf PMDs and PTEs share the
>>> exact same hardware format and do not use a distinct huge bit.
>>>
>>> Signed-off-by: Yin Tirui <yintirui at huawei.com>
>>> ---
>>> arch/arm64/include/asm/pgtable.h | 4 +++-
>>> arch/x86/include/asm/pgtable.h | 4 ++++
>>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/ asm/
>>> pgtable.h
>>> index b3e58735c49b..f2a7a40106d2 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -141,7 +141,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>>> #define pte_pfn(pte) (__pte_to_phys(pte) >> PAGE_SHIFT)
>>> #define pfn_pte(pfn,prot) \
>>> - __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) |
>>> pgprot_val(prot))
>>> + __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | \
>>> + ((pgprot_val(prot) & ~(PTE_TYPE_MASK & ~PTE_VALID)) | \
>>> + (PTE_TYPE_PAGE & ~PTE_VALID)))
>>> #define pte_none(pte) (!pte_val(pte))
>>> #define pte_page(pte) (pfn_to_page(pte_pfn(pte)))
>>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/ pgtable.h
>>> index 1662c5a8f445..a4dbd81d42bf 100644
>>> --- a/arch/x86/include/asm/pgtable.h
>>> +++ b/arch/x86/include/asm/pgtable.h
>>> @@ -738,6 +738,10 @@ static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>>> static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
>>> {
>>> phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
>>> +
>>> + /* Filter out _PAGE_PSE to ensure PTEs never carry the huge page bit */
>>> + pgprot = __pgprot(pgprot_val(pgprot) & ~_PAGE_PSE);
>>
>> Is it really a good idea to silently drop the bit?
>>
>> Today it can either be used for a large page (which should be a pmd,
>> of course), or - much worse - you'd strip the _PAGE_PAT bit, which is
>> at the same position in PTEs.
>>
>> So basically you are removing the ability to use some cache modes.
>>
>> NACK!
>>
>>
>> Juergen
>
> Hi Willy and Jürgen,
>
> Following up on the x86 _PAGE_PSE and _PAGE_PAT aliasing issue.
>
> To achieve the goal of keeping pfn_pte() pure and completely eradicating the
> pte_clrhuge() anti-pattern, we need a way to ensure pfn_pte() never receives a
> pgprot with the huge bit set.
>
> @Jürgen:
> Just to be absolutely certain: is there any safe way to filter out the huge page
> attributes directly inside x86's pfn_pte() without breaking PAT? Or does the
> hardware bit-aliasing make this strictly impossible at the pfn_pte() level?
There is no huge bit at the PTE level. It is existing only at the PMD and the
PUD level.
So: yes, it is absolutely impossible to filter it out, as the bit has a
different meaning in "real" PTEs (with "PTE" having the meaning: a translation
entry in a page referenced by a PMD entry not having the PSE bit set).
>
> @Willy @Jürgen:
> Assuming it is impossible to filter this safely inside pfn_pte() on x86, we must
> translate the pgprot before passing it down. To maintain strict type-safety and
> still drop pte_clrhuge(), I plan to introduce two arch-neutral wrappers:
>
> x86:
> /* Translates large prot to 4K. Shifts PAT back to bit 7, inherently clearing
> _PAGE_PSE */
> #define pgprot_huge_to_pte(prot) pgprot_large_2_4k(prot)
> /* Translates 4K prot to large. Shifts PAT to bit 12, strictly sets _PAGE_PSE */
> #define pgprot_pte_to_huge(prot) __pgprot(pgprot_val(pgprot_4k_2_large(prot)) |
> _PAGE_PSE)
Seems to be okay.
Juergen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB0DE9DD628BF132F.asc
Type: application/pgp-keys
Size: 3683 bytes
Desc: OpenPGP public key
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20260305/7e3c5452/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20260305/7e3c5452/attachment.sig>
More information about the linux-arm-kernel
mailing list