[PATCH v5 1/4] mm: modify pte format for Svnapot

Wed Oct 5 00:15:58 PDT 2022

On Wed, Oct 05, 2022 at 12:43:01PM +0800, Qinglin Pan wrote:
> Hi Conor and Andrew,
> 
> On 10/5/22 2:33 AM, Conor Dooley wrote:
> > On Tue, Oct 04, 2022 at 07:00:27PM +0200, Andrew Jones wrote:
> > > On Mon, Oct 03, 2022 at 09:47:18PM +0800, panqinglin2020 at iscas.ac.cn wrote:

> > > > +#define ALT_SVNAPOT_PTE_PFN(_val, _napot_shift, _pfn_mask, _pfn_shift)	\
> > > > +asm(ALTERNATIVE("and %0, %0, %1\n\t"					\
> > > > +		"srli %0, %0, %2\n\t"					\
> > > > +		__nops(3),						\
> > > > +		"srli t3, %0, %3\n\t"					\
> > > > +		"and %0, %0, %1\n\t"					\
> > > > +		"srli %0, %0, %2\n\t"					\
> > > > +		"sub  t4, %0, t3\n\t"					\
> > > > +		"and  %0, %0, t4",					\
> > > 
> > > This implements
> > > 
> > >    temp = ((pte & _PAGE_PFN_MASK) >> _PAGE_PFN_SHIFT);
> > >    pfn = temp & (temp - (pte >> _PAGE_NAPOT_SHIFT));
> > > 
> > > which for a non-napot pte just returns the same as the non-napot
> > > case would, but for a napot pte we return the same as the non-napot
> > > case but with its first set bit cleared, wherever that first set
> > > bit was. Can you explain to me why that's what we want to do?
> > > 
> 
> For 64KB napot pte, (pte >> _PAGE_NAPOT_SHIFT) will get 1, and temp will be
> something like 0xabcdabcdab8, but the correct pfn we expect should be
> 0xabcdabcdab0. We can get it by calculating (temp & (temp - 1)).
> So temp & (temp - (pte >> _PAGE_NAPOT_SHIFT)) will give correct pfn
> both in non-napot and napot case:)

I understood that and it makes sense to me for your example, where
temp = 0xabcdabcdab8, as we effectively clear the lower four bits as
expected (I think) for napot-order = 4. But, what if temp = 0xabcdabcdab0,
then we'll get 0xabcdabcdaa0 for the napot case. Is that still correct?
With the (temp & (temp - 1)) approach we'll always clear the first set
bit, wherever it is, e.g. 0xabcd0000800 would be 0xabcd0000000.  Am I
missing something about the expectations of the lower PPN bits of the PTE?

> > > >    */
> > > >   enum riscv_isa_ext_id {
> > > >   	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
> > > > +	RISCV_ISA_EXT_SSTC,
> > > > +	RISCV_ISA_EXT_SVNAPOT,
> > > >   	RISCV_ISA_EXT_SVPBMT,
> > > >   	RISCV_ISA_EXT_ZICBOM,
> > > >   	RISCV_ISA_EXT_ZIHINTPAUSE,
> > > > -	RISCV_ISA_EXT_SSTC,
> > > 
> > > Opportunistic alphabetizing an enum is a bit risky. It'd be better if this
> > > was a separate patch, because, even though nothing should care about the
> > > order, if something does care about the order, then it'd be easier to
> > > debug when this when changed alone.
> > 
> > I think for these things it is s/alphabetical/canonical order, given the
> > patch from Palmer I linked in my previous mail. Although thinking about
> > it, that means V before S, so SV before SS? Which his patch did not do:
> > https://lore.kernel.org/linux-riscv/20220920204518.10988-1-palmer@rivosinc.com/
> > 

I wonder if we should just go with alphabetical order (or no order) for
places that the order doesn't matter. In my experience it's hard to
enforce even alphabetical order when nothing breaks when the order is
"wrong". Where the order does matter we should enforce whatever that order
is supposed to be, of course. So maybe Palmer's patch isn't quite right,
but the canonical order, which is almost-alphabetic, is going to make my
head explode trying to use it...

> 
> Not very sure about how to place it by canonical order:(
> I will avoid reorderint other enum variants in this patch, and only add
> SVNAPOT just before SVPBMT. Another patch to reroder them is welcome:)

Or even just put SVNAPOT at the bottom of the enum for this patch.

> > > > +
> > > >   /*
> > > >    * [62:61] Svpbmt Memory Type definitions:
> > > >    *
> > > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > > index 7ec936910a96..c3fc3c661699 100644
> > > > --- a/arch/riscv/include/asm/pgtable.h
> > > > +++ b/arch/riscv/include/asm/pgtable.h
> > > > @@ -264,10 +264,39 @@ static inline pte_t pud_pte(pud_t pud)
> > > >   	return __pte(pud_val(pud));
> > > >   }
> > > > +static inline bool has_svnapot(void)
> > > > +{
> > > > +	u64 _val;
> > > > +
> > > > +	ALT_SVNAPOT(_val);
> > > 
> > > Why aren't we using the isa extension static key framework for this?
> 
> I am not very sure why static key is better than errata? If it is
> really necessary, I will convert it to static key in the next version.

Well the errata framework is for errata and the static key framework is
for zero-cost CPU feature checks like has_svnapot() is implementing. The
question shouldn't be why static key is better, but rather why would we
need to abuse the errata framework for a CPU feature check.

> > > >   /*
> > > >    * Probe presence of individual extensions.
> > > >    *
> > > > @@ -301,6 +316,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
> > > >   	if (cpufeature_probe_zicbom(stage))
> > > >   		cpu_req_feature |= (1U << CPUFEATURE_ZICBOM);
> > 
> > While in the area, could this be converted to BIT(CPUFEATURE_ZICBOM)?
> 
> This zicbom related code is not produced by this patchset. It may be better
> to convert it to BIT in another patch.
> 

Actually, to be type pedantic we shouldn't use BIT() below, but rather use
the (1U << ...) like above. cpu_req_feature is a u32.

> 
> > 
> > > > +	if (cpufeature_probe_svnapot(stage))
> > > > +		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);

Thanks,
drew