[PATCH v6 13/37] arm64/sme: Basic enumeration support

Catalin Marinas catalin.marinas at arm.com
Fri Dec 10 02:41:18 PST 2021


On Thu, Dec 09, 2021 at 07:28:16PM +0000, Mark Brown wrote:
> On Thu, Dec 09, 2021 at 06:41:26PM +0000, Catalin Marinas wrote:
> > On Mon, Nov 15, 2021 at 03:28:11PM +0000, Mark Brown wrote:
> > > +#define HWCAP2_SME		(1 << 20)
> > > +#define HWCAP2_SME_I16I64	(1 << 21)
> > > +#define HWCAP2_SME_F64F64	(1 << 22)
> > > +#define HWCAP2_SME_I8I32	(1 << 23)
> > > +#define HWCAP2_SME_F16F32	(1 << 24)
> > > +#define HWCAP2_SME_B16F32	(1 << 25)
> > > +#define HWCAP2_SME_F32F32	(1 << 26)
> > > +#define HWCAP2_SME_FA64		(1 << 27)
> 
> > At this pace we'll need HWCAP3 pretty soon (since we only allocated
> > 32-bit in each). I wonder whether we could instead not bother at all and
> > just provide user-space emulation for ID_AA64SMFR0_EL1.
> 
> I think so if people are willing to go along with just having userspace
> check the ID register (IIRC access to it already does the right thing
> but I need to confirm).  We'll also need to think about how we handle
> any new SVE features, that's got a similar thing going on and is most of
> the existing usage of HWCAP2.

It would be good to get feedback from the libc people. IIRC the ifunc
resolver relies currently on the HWCAP bits. Could this be adapted to
use the MRS instruction? We'd still keep the main HWCAP2_SME but without
the finer-grained bits.

The other option is to start going into the upper 32-bit of the
elf_hwcap. We tried to avoid this some time back when we were still
having doubts about merging ILP32.

> > > +	{
> > > +		.desc = "FA64",
> > > +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > > +		.capability = ARM64_SME_FA64,
> > > +		.sys_reg = SYS_ID_AA64SMFR0_EL1,
> > > +		.sign = FTR_UNSIGNED,
> > > +		.field_pos = ID_AA64SMFR0_FA64_SHIFT,
> > > +		.min_field_value = ID_AA64SMFR0_FA64,
> > > +		.matches = has_feature_flag,
> > > +		.cpu_enable = fa64_kernel_enable,
> > > +	},
> 
> > I'll comment here rather than the patch introducing has_feature_flag():
> > an alternative would be to add a .field_width option and in
> > feature_matches() use cpuid_feature_extract_field_width() directly. All
> > the arm64_ftr_bits entries already have a width, so just generalise it
> > for arm64_cpu_capabilities.
> 
> Sure, if people are happy with that - it's a more invasive change since
> we don't currently set the widths, I wasn't clear if that was a case of
> not needing it right now or a design decision.

We didn't have a field_width since they were all 4 bits until SVE. If
you don't want to touch all the entries in the array, we can say that a
0 value (i.e. not explicitly initialised) means default 4 and update it
during init_cpu_features().

-- 
Catalin



More information about the linux-arm-kernel mailing list