[PATCH v2] arm64: Add basic JSON register parser

Marc Zyngier maz at kernel.org
Thu Jan 2 06:43:39 PST 2025


We currently populate the sysreg file by hand from the ARM ARM,
resulting in a bunch of errors being introduced on a regular basis.
While there is an XML dump of the architecture produced on a quarterly
basis, the license that comes attached to it excludes any sort of
open-source usage.

However, ARM has recently made available a JSON dump[1] that contains
a reduced set of information under a BSD license. This has enough
data to extract what is relevant to the sysreg file.

This is achieved using a JQ script that I cobbled together over
the holiday, and while it has a number of limitations, it already
works well enough to extract useful data.

As an example, here's what the script returns for TCR_EL1:

$ jq -r --arg REG TCR_EL1 -f arch/arm64/tools/dumpreg.jq ~/Work/XML/2024-12/AARCHMRS_BSD_A_profile/Registers.json
TCR_EL1	[3,0,2,0,2]	MRS
TCR_EL1	[3,0,2,0,2]	MSRregister
TCR_EL12	[3,5,2,0,2]	MRS
TCR_EL12	[3,5,2,0,2]	MSRregister
TCRALIAS_EL1	[3,0,2,7,6]	MRS
TCRALIAS_EL1	[3,0,2,7,6]	MSRregister
Res0	63:62
Field	61	MTX1	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	60	MTX0	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	59	DS	# Field cond: (IsFeatureImplemented(FEAT_LPA2) && (!IsFeatureImplemented(FEAT_D128) || (AArch64 TCR2_EL1.D128 == '0')))
Field	59	DS	# Field cond: true
Field	58	TCMA1	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	57	TCMA0	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	56	E0PD1	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	55	E0PD0	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	54	NFD1	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	53	NFD0	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	52	TBID1	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	51	TBID0	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	50	HWU162	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	49	HWU161	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	48	HWU160	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	47	HWU159	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	46	HWU062	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	45	HWU061	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	44	HWU060	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	43	HWU059	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	42	HPD1	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	41	HPD0	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	40	HD	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	39	HA	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	38	TBI1
Field	37	TBI0
Field	36	AS
Res0	35
Field	34:32	IPS
Field	31:30	TG1
Field	29:28	SH1
Field	27:26	ORGN1
Field	25:24	IRGN1
Field	23	EPD1
Field	22	A1
Field	21:16	T1SZ
Field	15:14	TG0
Field	13:12	SH0
Field	11:10	ORGN0
Field	9:8	IRGN0
Field	7	EPD0
Res0	6
Field	5:0	T0SZ

I completely expect this to quickly rewritten by people who know
what they are doing (I don't) and improved as we understand more
of the data model.

[1] https://developer.arm.com/-/cdn-downloads/permalink/Exploration-Tools-OS-Machine-Readable-Data/AARCHMRS_BSD/AARCHMRS_BSD_A_profile-2024-12.tar.gz

Signed-off-by: Marc Zyngier <maz at kernel.org>
Cc: Mark Rutland <mark.rutland at arm.com>
Cc: Catalin Marinas <catalin.marinas at arm.com>
Cc: Will Deacon <will at kernel.org>
Cc: Mark Brown <broonie at kernel.org>
---

Notes:
    - From v1:
    
      - Fix the accessor encoding order
      - Handing of nesting fields, arrays, vectors
      - Plenty of additional JSON handling

 arch/arm64/tools/dumpreg.jq | 258 ++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)
 create mode 100644 arch/arm64/tools/dumpreg.jq

diff --git a/arch/arm64/tools/dumpreg.jq b/arch/arm64/tools/dumpreg.jq
new file mode 100644
index 0000000000000..efb198066820f
--- /dev/null
+++ b/arch/arm64/tools/dumpreg.jq
@@ -0,0 +1,258 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# dumpreg.jq: JSON arm64 system register data extractor
+#
+# Author: Marc Zyngier <maz at kernel.org>
+#
+# Usage: jq -r --arg REG "XZY_ELx" -f ./dumpreg.jq Registers.json
+
+# Dump a set of semi-pertinent informations (encodings, fields,
+# conditions, field position and width) about register XZY_ELx as
+# contained in ARM's AARCHMRS_BSD_A_profile JSON tarball.
+
+# Not setting REG will dump the whole register file in one go. While
+# this is entertaining, it isn't very useful.
+
+# This can/should be used to populate the arch/arm64/tools/sysreg
+# file, instead of copying things by hand.
+
+# The tool currently has a bunch of limitations that users need to be
+# aware of, but none that should have a major impact on the usability:
+
+# - All accessors are shown, irrespective of the conditions in which
+#   the accessors are actually available
+
+# - All Fields.ConstantField are displayed as UnsignedEnum,
+#   irrespective of the signess of the field (as the JSON doesn't
+#   carry this information).
+
+# - Value ranges are displayed using '[...]'.
+
+# - Fields are processed and displayed in the order of the JSON
+#   source, which may not be the order in the register.
+
+# - Conditional fields may appear multiple times.
+
+# - ... and probably more...
+
+def walknode:
+	def walkjoin(s):
+		map(walknode) | join(s);
+
+        if   (._type == "AST.Identifier" or ._type == "AST.Integer" or
+	      ._type == "Values.Value" or ._type == "AST.Bool" or
+	      ._type == "Types.String") then
+	     	.value
+	elif (._type == "Types.Field") then
+		"\(.value.name).\(.value.field)"
+	elif (._type == "AST.UnaryOp") then
+		"\(.op)(\(.expr | walknode))"
+	elif (._type == "AST.Function") then
+		"\(.name)(\(.arguments | walkjoin(", ")))"
+	elif (._type == "AST.DotAtom") then
+		.values | walkjoin(".")
+	elif (._type == "AST.BinaryOp") then
+		"(\(.left | walknode) \(.op) \(.right | walknode))"
+	elif (._type == "Types.RegisterType") then
+		.name
+	elif (._type == "AST.Type") then
+		"\(.name | walknode)"
+	elif (._type == "AST.Slice") then
+		"\(.left.value):\(.right.value)"
+	elif (._type == "AST.Set") then
+		.values | map(walknode)
+	elif (._type == "AST.Assignment")  then
+		"\(.var | walknode) = \(.val | walknode)"
+	elif (._type == "AST.TypeAnnotation")  then
+		"\(.var | walknode):\(.type | walknode)"
+	elif (._type == "AST.SquareOp") then
+		"\(.var | walknode)[\(.arguments | walkjoin(", "))]"
+	elif (._type == "AST.Return") then
+		"return"
+	elif (._type == "AST.Concat") then
+		"[\(.values | walkjoin(", "))]"
+	elif (._type == "AST.Tuple") then
+		"(\(.values | walkjoin(", ")))"
+	else	# debug catch-all
+		.
+	end;
+
+def range:
+	. as { _type: $type, start: $start, width: $width } |
+	if ($width == 1) then
+		"\($start)"
+	else
+		"\($start + $width - 1):\($start)"
+	end;
+
+def fld:
+	(if (.condstr.text) then "\t\(.condstr.text)"
+	 else "" end) as $cond |
+	"\(.type)\t\(.range | range)\t\(.name)\($cond)";
+
+def condition(source):
+	"# \(source) cond: \(.condition | walknode)";
+
+def unquote:
+	"'" as $q | (ltrimstr($q) | rtrimstr($q));
+
+def binvalue:
+	.value | unquote as $v | "\t0b\($v)\tVAL_\($v)";
+
+def dumpconstants:
+	if   (._type == "Values.Value") then
+		binvalue
+	elif (._type == "Values.ValueRange") then
+		(.start | binvalue), "\t[...]", (.end | binvalue)
+	elif (._type == "Values.ConditionalValue") then
+		"\(.values.values[] | dumpconstants)\t\(condition("Value"))"
+ 	else	# Debug catch all
+		.
+	end;
+
+def dumpenum:
+	# Things like SMIDR_EL1.Affinity do not describe
+	# the value range, hence the []? hack below.
+	(.value.constraints.values[]? | dumpconstants);
+
+def genarrayelt(n; bpf):
+	"<\(.index_variable)>" as $v |
+	(.rangeset | reverse) as $rs |
+	($rs | length) as $nrs |
+	{
+		_type: (if (bpf > 1) then "Fields.ConstantField"
+			else "Fields.Field" end),
+		name: (.name | sub($v; "\(n)")),
+		rangeset: [
+			{
+				_type: "Range",
+				start: (if ($nrs > 1) then $rs[n].start
+				        else $rs[0].start + n * bpf end),
+				width: bpf
+			}
+		],
+		value: { constraints: .values },
+		condstr: (if (.condstr) then
+			    { text: (.condstr.text | sub($v; "\(n)")) }
+			  else
+			    null
+			  end)
+	};
+
+def genarray:
+	# Oh the fun we're having: convert each element of the array
+	# into its own architectural field, warts and all. Additional
+	# fun is provided to compute the number of bits per fields,
+	# as the elements can be spread over multiple rangesets.
+	. as $field |
+	.indexes[0].width as $nr |
+	((reduce .rangeset[].width as $sz (0; . + $sz)) / $nr) as $bpf |
+	[ range(0; $nr) ] | reverse | map(. as $n | $field | genarrayelt($n; $bpf));
+
+# For each range of a field, unpack it as start and width, and
+# apply it to each range of the parent field (used as a base).
+# Although this can result in a combinatorial explosion, the
+# likely case is that one of the two sets is of size one.
+def mergerangesets(base):
+	.[] |
+	.start as $s |
+	.width as $w |
+	base | map({
+			_type: "Range",
+			start: (.start + $s),
+			width: ([ $w, .width ] | min)
+		   });
+
+def depthstr(depth):
+	[ range(0, depth) ] | map(32, 32) | implode;
+
+def walkfields(depth):
+	depthstr(depth) as $dep |
+	if   (._type == "Fields.Reserved" and .value == "RES0") then
+		{ type: "Res0", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved" and .value == "RES1") then
+		{ type: "Res1", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ConditionalField") then
+		# Propagate the condition text over all conditional
+		# fields by injecting a new ".condstr.text" field.
+		# Also, the ranges must be combined as they nest.
+		.rangeset as $r |
+		.fields | map(condition("Field") as $c |
+			      .field.condstr |= { text: $c }) |
+			  map(.field.rangeset |= mergerangesets($r)) |
+		.[] | .field | walkfields(depth)
+	elif (._type == "Fields.Dynamic") then
+		({ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } | fld),
+	     	(.rangeset as $r | .instances[] |
+		 ((.display // .name // "Instance") as $src |
+		  "\(depthstr(depth + 1))\(condition($src))",
+		  # Remap the rangesets to display the absolute range
+		  (.values | map(.rangeset |= mergerangesets($r)) |
+		   .[] | walkfields(depth + 1))))
+	elif (._type == "Fields.ConstantField") then
+		({ type: "UnsignedEnum", name: .name, range: .rangeset[], condstr: .condstr } |
+		 "\($dep)\(fld)"),
+		dumpenum,
+		"EndEnum"
+	elif (._type == "Fields.Field") then
+		{ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved") then
+		{ type: "Field", name: .value, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ImplementationDefined") then
+		{ type: "Field", name: (.name // "IMPDEF"), range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Array" or ._type == "Fields.Vector") then
+	     	genarray | .[] | walkfields(depth)
+ 	else	# Debug catch all
+		.
+	end;
+
+def tautology:
+	(.condition.value == true);
+
+def walkreg:
+	(.fieldsets | length) as $l |
+	.fieldsets[] |
+	  (if ($l > 1 or (tautology | not)) then condition("Fieldset") else empty end),
+	  (.values[] | walkfields(0));
+
+def bin_to_i:
+	def bintoi:
+		(length - 1) as $e |
+		((.[0] - 48) * ($e | exp2)) + (if ($e > 0) then .[1:] | bintoi
+				     	       else 0 end);
+	explode | bintoi;
+
+def computeencoding:
+	if (.) then
+		if   (._type == "Values.Value") then .value | unquote | bin_to_i
+		elif (._type == "Values.Group") then .value
+		elif (._type == "Values.EquationValue") then "\(.value)[\(.slice[] | range)]"
+		else . # Debug catch all
+		end
+	else
+		"#Imm"
+	end;
+
+def encodings:
+	.encodings | [ .op0, .op1, .CRn, .CRm, .op2 ] | map(computeencoding);
+
+def accessorencoding:
+	(.name | ltrimstr("A64.")) as $name |
+	.encoding[] | "\(.asmvalue)\t\(encodings)\t\($name)";
+
+def accessors:
+	.accessors[] |
+	accessorencoding;
+
+def regcondition:
+	if (tautology | not) then condition("Reg") else empty end;
+
+.[] | select (._type == "Register" or ._type == "RegisterArray") |
+      select (.state == "AArch64" and
+	      ($ARGS.named.REG == null or .name == $ARGS.named.REG)) |
+      "# \(.name)",accessors,regcondition,walkreg
-- 
2.39.2




More information about the linux-arm-kernel mailing list