[PATCH v5 1/4] mm: modify pte format for Svnapot

Qinglin Pan panqinglin2020 at iscas.ac.cn
Tue Oct 4 21:43:01 PDT 2022


Hi Conor and Andrew,

On 10/5/22 2:33 AM, Conor Dooley wrote:
> On Tue, Oct 04, 2022 at 07:00:27PM +0200, Andrew Jones wrote:
>> On Mon, Oct 03, 2022 at 09:47:18PM +0800, panqinglin2020 at iscas.ac.cn wrote:
>>> From: Qinglin Pan <panqinglin2020 at iscas.ac.cn>
>>>
>>> This commit adds two erratas to enable/disable svnapot support, patches
>>> code dynamicly when "svnapot" is in the "riscv,isa" field of fdt and
>>> SVNAPOT compile option is set. It will influence the behavior of
>>> has_svnapot function and pte_pfn function. All code dependent on svnapot
>>> should make sure that has_svnapot return true firstly.
>>>
>>> Also, this commit modifies PTE definition for Svnapot, and creates some
>>> functions in pgtable.h to mark a PTE as napot and check if it is a Svnapot
>>> PTE. Until now, only 64KB napot size is supported in draft spec, so some
>>> macros has only 64KB version.
>>>
>>> Signed-off-by: Qinglin Pan <panqinglin2020 at iscas.ac.cn>
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index d557cc50295d..4354024aae21 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -383,6 +383,20 @@ config RISCV_ISA_C
>>>   
>>>   	  If you don't know what to do here, say Y.
>>>   
>>> +config RISCV_ISA_SVNAPOT
>>> +	bool "SVNAPOT extension support"
>>> +	depends on 64BIT && MMU
>>> +	select RISCV_ALTERNATIVE
>>> +	default n
>>
>> What's the reason for this to be default no? If there's a trade-off to be
>> made which requires opt-in, then it should be described in the text below.
> 
> In addition, existing extensions are default y unless they depend on a
> compiler option.

Get it. I will set the default value to y :)

> 
>>> +	help
>>> +	  Say Y if you want to allow kernel to detect SVNAPOT ISA-extension
>>> +	  dynamically in boot time and enable its usage.
>>> +
>>> +	  SVNAPOT extension helps to mark contiguous PTEs as a range
>>> +	  of contiguous virtual-to-physical translations, with a naturally
>>> +	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
>>> +	  size.
>>> +
>>>   config RISCV_ISA_SVPBMT
>>>   	bool "SVPBMT extension support"
>>>   	depends on 64BIT && MMU
>>> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>>> index 19a771085781..f3aff5ef52e4 100644
>>> --- a/arch/riscv/include/asm/errata_list.h
>>> +++ b/arch/riscv/include/asm/errata_list.h
>>> @@ -22,7 +22,8 @@
>>>   
>>>   #define	CPUFEATURE_SVPBMT 0
>>>   #define	CPUFEATURE_ZICBOM 1
>>> -#define	CPUFEATURE_NUMBER 2
>>> +#define	CPUFEATURE_SVNAPOT 2
>>> +#define	CPUFEATURE_NUMBER 3
>>
>> nit: Maybe take the opportunity to tab out the numbers to get them all
>> lined up. And tab enough for future longer names too.
> 
> s/maybe// ;)
> 

Thanks:)
Will fix this lining up and the below one in the next version.

>>
>>>   
>>>   #ifdef __ASSEMBLY__
>>>   
>>> @@ -142,6 +143,26 @@ asm volatile(ALTERNATIVE_2(						\
>>>   	    "r"((unsigned long)(_start) + (_size))			\
>>>   	: "a0")
>>>   
>>> +#define ALT_SVNAPOT(_val)						\
>>> +asm(ALTERNATIVE("li %0, 0", "li %0, 1", 0,				\
>>> +		CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)		\
>>> +		: "=r"(_val) :)
>>> +
>>> +#define ALT_SVNAPOT_PTE_PFN(_val, _napot_shift, _pfn_mask, _pfn_shift)	\
>>> +asm(ALTERNATIVE("and %0, %0, %1\n\t"					\
>>> +		"srli %0, %0, %2\n\t"					\
>>> +		__nops(3),						\
>>> +		"srli t3, %0, %3\n\t"					\
>>> +		"and %0, %0, %1\n\t"					\
>>> +		"srli %0, %0, %2\n\t"					\
>>> +		"sub  t4, %0, t3\n\t"					\
>>> +		"and  %0, %0, t4",					\
>>
>> This implements
>>
>>    temp = ((pte & _PAGE_PFN_MASK) >> _PAGE_PFN_SHIFT);
>>    pfn = temp & (temp - (pte >> _PAGE_NAPOT_SHIFT));
>>
>> which for a non-napot pte just returns the same as the non-napot
>> case would, but for a napot pte we return the same as the non-napot
>> case but with its first set bit cleared, wherever that first set
>> bit was. Can you explain to me why that's what we want to do?
>>

For 64KB napot pte, (pte >> _PAGE_NAPOT_SHIFT) will get 1, and temp will 
be something like 0xabcdabcdab8, but the correct pfn we expect should be
0xabcdabcdab0. We can get it by calculating (temp & (temp - 1)).
So temp & (temp - (pte >> _PAGE_NAPOT_SHIFT)) will give correct pfn
both in non-napot and napot case:)

>>> +		0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)	\
>>> +		: "+r"(_val)						\
>>> +		: "r"(_pfn_mask),					\
>>> +		  "i"(_pfn_shift),					\
>>> +		  "i"(_napot_shift))
>>
>> Should add t3 and t4 to clobbers list.
>>

Will do it in the next version.

>>> +
>>>   #endif /* __ASSEMBLY__ */
>>>   
>>>   #endif
>>> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>>> index 6f59ec64175e..83b8e21a3573 100644
>>> --- a/arch/riscv/include/asm/hwcap.h
>>> +++ b/arch/riscv/include/asm/hwcap.h
>>> @@ -54,10 +54,11 @@ extern unsigned long elf_hwcap;
>>>    */
>>>   enum riscv_isa_ext_id {
>>>   	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
>>> +	RISCV_ISA_EXT_SSTC,
>>> +	RISCV_ISA_EXT_SVNAPOT,
>>>   	RISCV_ISA_EXT_SVPBMT,
>>>   	RISCV_ISA_EXT_ZICBOM,
>>>   	RISCV_ISA_EXT_ZIHINTPAUSE,
>>> -	RISCV_ISA_EXT_SSTC,
>>
>> Opportunistic alphabetizing an enum is a bit risky. It'd be better if this
>> was a separate patch, because, even though nothing should care about the
>> order, if something does care about the order, then it'd be easier to
>> debug when this when changed alone.
> 
> I think for these things it is s/alphabetical/canonical order, given the
> patch from Palmer I linked in my previous mail. Although thinking about
> it, that means V before S, so SV before SS? Which his patch did not do:
> https://lore.kernel.org/linux-riscv/20220920204518.10988-1-palmer@rivosinc.com/
> 

Not very sure about how to place it by canonical order:(
I will avoid reorderint other enum variants in this patch, and only add
SVNAPOT just before SVPBMT. Another patch to reroder them is welcome:)

>>
>>>   	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>>>   };
>>>   
>>> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
>>> index dc42375c2357..25ec541192e5 100644
>>> --- a/arch/riscv/include/asm/pgtable-64.h
>>> +++ b/arch/riscv/include/asm/pgtable-64.h
>>> @@ -74,6 +74,19 @@ typedef struct {
>>>    */
>>>   #define _PAGE_PFN_MASK  GENMASK(53, 10)
>>>   
>>> +/*
>>> + * [63] Svnapot definitions:
>>> + * 0 Svnapot disabled
>>> + * 1 Svnapot enabled
>>> + */
>>> +#define _PAGE_NAPOT_SHIFT 63
>>> +#define _PAGE_NAPOT      BIT(_PAGE_NAPOT_SHIFT)
>>> +#define NAPOT_CONT64KB_ORDER 4UL
>>> +#define NAPOT_CONT64KB_SHIFT (NAPOT_CONT64KB_ORDER + PAGE_SHIFT)
>>> +#define NAPOT_CONT64KB_SIZE BIT(NAPOT_CONT64KB_SHIFT)
>>> +#define NAPOT_CONT64KB_MASK (NAPOT_CONT64KB_SIZE - 1UL)
>>> +#define NAPOT_64KB_PTE_NUM BIT(NAPOT_CONT64KB_ORDER)
>>
>> Please make three lined up columns out of these defines.
>>
>> #define DEF		VAL
>>
>>> +
>>>   /*
>>>    * [62:61] Svpbmt Memory Type definitions:
>>>    *
>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>> index 7ec936910a96..c3fc3c661699 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -264,10 +264,39 @@ static inline pte_t pud_pte(pud_t pud)
>>>   	return __pte(pud_val(pud));
>>>   }
>>>   
>>> +static inline bool has_svnapot(void)
>>> +{
>>> +	u64 _val;
>>> +
>>> +	ALT_SVNAPOT(_val);
>>
>> Why aren't we using the isa extension static key framework for this?

I am not very sure why static key is better than errata? If it is
really necessary, I will convert it to static key in the next version.

>>
>>> +	return _val;
>>> +}
>>> +
>>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>>> +
>>> +static inline unsigned long pte_napot(pte_t pte)
>>> +{
>>> +	return pte_val(pte) & _PAGE_NAPOT;
>>> +}
>>> +
>>> +static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
>>> +{
>>> +	int pos = order - 1 + _PAGE_PFN_SHIFT;
>>> +	unsigned long napot_bit = BIT(pos);
>>> +	unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
>>> +
>>> +	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
>>> +}
>>> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
>>> +
>>>   /* Yields the page frame number (PFN) of a page table entry */
>>>   static inline unsigned long pte_pfn(pte_t pte)
>>>   {
>>> -	return __page_val_to_pfn(pte_val(pte));
>>> +	unsigned long _val  = pte_val(pte);
>>> +
>>> +	ALT_SVNAPOT_PTE_PFN(_val, _PAGE_NAPOT_SHIFT,
>>> +			    _PAGE_PFN_MASK, _PAGE_PFN_SHIFT);
>>> +	return _val;
>>>   }
>>>   
>>>   #define pte_page(x)     pfn_to_page(pte_pfn(x))
>>> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>>> index 0be8a2403212..d2a61122c595 100644
>>> --- a/arch/riscv/kernel/cpu.c
>>> +++ b/arch/riscv/kernel/cpu.c
>>> @@ -96,6 +96,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>>>   	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>>>   	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>>>   	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>>> +	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>>
>> If we want to do a separate isa extension alphabetizing patch, then this
>> array would be a good one to do since it's the output order.
>>
>>>   	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>>>   };
>>>   
>>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>>> index 553d755483ed..3557c5cc6f04 100644
>>> --- a/arch/riscv/kernel/cpufeature.c
>>> +++ b/arch/riscv/kernel/cpufeature.c
>>> @@ -204,6 +204,7 @@ void __init riscv_fill_hwcap(void)
>>>   				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>>>   				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>>>   				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>>> +				SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
>>
>> Could alphabetize this too, even though it doesn't matter.
>>
>>>   			}
>>>   #undef SET_ISA_EXT_MAP
>>>   		}
>>> @@ -284,6 +285,20 @@ static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
>>>   	return false;
>>>   }
>>>   
>>> +static bool __init_or_module cpufeature_probe_svnapot(unsigned int stage)
>>> +{
>>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> 
> Is there a reason that this cannot be if IS_ENABLED(RISCV_ISA_SVNAPOT)?

Will IS_ENABLED in the next version.

> 
>>> +	switch (stage) {
>>> +	case RISCV_ALTERNATIVES_EARLY_BOOT:
>>> +		return false;
>>> +	default:
>>> +		return riscv_isa_extension_available(NULL, SVNAPOT);
>>> +	}
>>> +#endif
>>> +
>>> +	return false;
>>> +}
>>> +
>>>   /*
>>>    * Probe presence of individual extensions.
>>>    *
>>> @@ -301,6 +316,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>>>   	if (cpufeature_probe_zicbom(stage))
>>>   		cpu_req_feature |= (1U << CPUFEATURE_ZICBOM);
> 
> While in the area, could this be converted to BIT(CPUFEATURE_ZICBOM)?

This zicbom related code is not produced by this patchset. It may be 
better to convert it to BIT in another patch.

Qinglin.

> 
>>>   
>>> +	if (cpufeature_probe_svnapot(stage))
>>> +		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
>>> +
>>>   	return cpu_req_feature;
>>>   }
>>>   
>>> -- 
>>> 2.35.1
>>>
>>
>> Thanks,
>> drew
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv




More information about the linux-riscv mailing list