[RFC 00/15] Device Tree schemas and validation

Thu Oct 3 09:53:35 EDT 2013

Hi David,

On 02/10/2013 16:29, David Gibson wrote:
> On Tue, Oct 01, 2013 at 04:22:24PM -0600, Stephen Warren wrote:
>> On 09/24/2013 10:52 AM, Benoit Cousson wrote:
>>> Hi All,
>>>
>>> Following the discussion that happened during LCE-2013 and the email
>>> thread started by Tomasz few months ago [1], here is a first attempt
>>> to introduce:
>>> - a schema language to define the bindings accurately
>>> - DTS validation during device tree compilation in DTC itself
>>
>> Sorry, this is probably going to sound a bit negative. Hopefully you
>> find it constructive though.
>>
>>> The syntax for a schema is the same as the one for dts. This choice has
>>> been made to simplify its development, to maximize the code reuse and
>>> finally because the format is human-readable.
>>
>> I'm not convinced that's a good decision.
>>
>> DT is a language for representing data.
>>
>> The validation checks described by schemas are rules, or code, and not
>> static data.
>>
>> So, while I'm sure it's possible to shoe-horn at least some reasonable
>> subset of DT validation into DT syntax itself, I feel it's unlikely to
>> yield something that's scalable enough.
>
> I tend to agree.
>
>> For example, it's easy to specify that a property must be 2 cells long.
>> What if it could be any multiple of two? That's a lot of numbers to
>> explicitly enumerate as data. Sure, you can then invent syntax to
>> represent that specific rule (parameterized by 2), but what about the
>> next similar-but-different rule? The only approach I can think of to
>> that is to allow the schema to contain arbitrary expressions, which
>> would likely need to morph into arbitary statements not just
>> expressions. Once you're there, I think the schema would be better
>> represented as a programming language rather than as a data structure
>> that could have code hooked into it.
>>
>>> How to:
>>>   * Associate a schema to one or several nodes
>>>
>>> As said earlier a schema can be used to validate one or several nodes
>>> from a dts. To do this the "compatible" properties from the nodes which
>>> should be validated must be present in the schema.
>>>
>>> 	timer1: timer at 4a318000 {
>>> 		compatible = "ti,omap3430-timer";
>> ...
>>> To write a schema which will validate OMAP Timers like the one above,
>>> one may write the following schema:
>>>
>>> 	/dts-v1/;
>>> 	/ {
>>> 		compatible = "ti,omap[0-9]+-timer";
>>
>> What about DT nodes that don't have a compatible value? We certainly
>> have some of those already like /memory and /chosen. We should be able
>> to validate their schema too. This probably doesn't invalidate being
>> able to look things up by compatible value though; it just means we need
>> some additional mechanisms too.
>
> More to the point, what about the properties of a node whose format is
> defined not by this node's binding but by some other nodes binding.
> e.g. the exact format of reg and ranges is at least partially
> determined by the parent bus's binding, and interrupts is defined
> partially by the interrupt parent's binding.  gpio properties are
> defined by a combination of a global binding and the gpio parent,
> IIRC.

Yeah, that's a general concern that Stephen raised several time as well.
We need to figure out some way to handle that.

>>>   * Define constraints on properties
>>>
>>> To define constraints on a property one has to create a node in a schema
>>> which has as name the name of the property that one want to validate.
>>>
>>> To specify constraints on the property "ti,hwmods" of OMAP Timers one
>>> can write this schema:
>>>
>>> 	/dts-v1/;
>>> 	/ {
>>> 		compatible = "ti,omap[0-9]+-timer";
>>> 		ti,hwmods {
>>> 			...
>>> 		};
>>
>> compatible and ti,hwmods are both properties in the DT file. However, in
>> the schema above, one appears as a property, and one as a node. I don't
>> like that inconsistency. It'd be better if compatible was a node too.
>
> Essentially what's going on here is that to describe the constraint on
> a property, a node with corresponding name is defined to encode the
> parameters of that constraint.  It kind of works, but it's forced.  It
> also hits problems since nodes and properties are technically in
> different namespaces, although they rarely collide in real cases.

OK, so would you suggest keeping mapping between node / attribute in DTS 
and in the schema?

>>> If one want to use a regular as property name one can write this schema:
>>>
>>> 	/dts-v1/;
>>> 	/ {
>>> 		compatible = "abc";
>>> 		def {
>>> 			name = "def[0-9]";
>>
>> Isn't it valid to have a property named "name" within the node itself?
>> How do you differentiate between specifying the node name and the name
>> property?
>
> Or to look at it another way, how do you differentiate between nodes
> representing encoded constraints for a property, and nodes
> representing nodes directly.
>
>> What if the node name needs more validation than just a regex. For
>> example, suppose we want to validate the
>> unit-name-must-match-reg-address rule. We need to write some complex
>> expression using data extracted from reg to calculate the unit address.
>> Equally, the node name perhaps has to exist in some global list of
>> acceptable node names. It would be extremely tricky if not impossible to
>> do that with a regex.
>>
>>> 			...
>>> 		};
>>> 	};
>>>
>>> Above one can see that the "name" property override the node name.
>>
>> Override implies that dtc would change the node name during compilation.
>> I think s/override/validate/ or s/override/overrides the validation
>> rules for/?
>
> Actually, dtc already contains checks that a "name" property (if
> present) matches the unit name.  Name properties vs. node names work a
> bit differently in the flat-tree world versus traditional OF, and this
> checks ensures that flat trees don't do (at least some) things which
> would break the OF traditional approach.
>
>>>   * Require the presence of a property inside a node or inside one of its
>>> parents
>> ...
>>> /dts-v1/;
>>> / {
>>>      compatible = "ti,twl[0-9]+-rtc";
>>>      interrupt-controller {
>>>          is-required;
>>>          can-be-inherited;
>>
>> interrupt-controller isn't a good example here, since it isn't a
>> property that would typically be inherited. Why not use interrupt-parent
>> instead?
>>
>>> One can check if 'node' has the following subnode 'subnode1', 'subnode2',
>>> and 'abc' with the schema below:
>>>
>>> /dts-v1/;
>>> / {
>>>      compatible = "comp";
>>>      children = "abc", "subnode[0-9]";
>>> };
>>
>> How is the schema for each sub-node specified?
>>
>> What if some nodes are optional and some required? The conditions where
>> a sub-node is required might be complex, and I think we'd always want to
>> be able to represent them in whatever schema language we chose.
>>
>> The most obvious way would be to make each sub-node's schema appear as a
>> sub-node within the main node's schema, but then how do you tell if a
>> schema node describes a property or a node?
>>
>> Note that the following DT file is currently accepted by dtc even if it
>> may not be the best choice of property and node names:
>>
>> ==========
>> /dts-v1/;
>>
>> / {
>> 	foo = <1>;
>> 	foo {};
>> };
>> ==========
>
> Note that node / property name collisions are not entirely theoretical
> either.  They are permitted in IEEE1275 and there are real Apple
> device trees in the wild which have them.  It's rare and discouraged,
> obviously.
>
>>>   * Constraints on array size
>>>
>>> One can specify the following constraints on array size:
>>>   - length: specify the exact length that an array must have.
>>>   - min-length: specify the minimum number of elements an array must have.
>>>   - max-length: specify the maximum number of elements an array must have.
>>
>> This seems rather inflexible; it'll cover a lot of the simple cases, but
>> hit a wall pretty soon. For example, how would it validate a property
>> that is supposed to include 3 GPIO specifiers, where the GPIO specifiers
>> are going to have DT-specific lengths, since the length of each
>> specifier is defined by the node that the phandles reference?
>>
>>
>> Overall, I believe perhaps the single most important aspect of any DT
>> schema is schema inheritance or instancing, and this proposal doesn't
>> appear to address that issue at all.
>>
>> Inheritance of schemas:
>>
>> For example, any node that is addressed must contain a reg property. The
>> constraints on that property are identical in all bindings; it must
>> consist of #address-cells + #size-cells integer values (cells). We don't
>> want to have to cut/paste that rule into every single binding
>> definition. Rather, we should simply say something like "this binding
>> uses the reg property", and the schema validation tool will look up the
>> definition of "reg property", and hence know how to validate it.
>>
>> Similarly, any binding that describes a GPIO controller will have some
>> similar requirements; the gpio-controller and #gpio-cells properties
>> must be present. The schema should simply say "I'm a GPIO controller",
>> and the schema tool should add some extra requirements to nodes of that
>> type.
>>
>> Instancing of schemas:
>>
>> Any binding that uses GPIOs should be able to say that a particular
>> property (e.g. "enable-gpios") is-a GPIO-specifier (with parameters
>> "enable" for the property name, min/max/expression length, etc.), and
>> then the schema validation tool would know to apply rules for a
>> specifier list to that property (and be able to check the property name).
>
> Yes, I agree both of those are important.
>
>
> So, here's a counter-proposal of at least a rough outline of how I
> think schemas could work, in a way that's still based generally on dt
> syntax.

That seems to be well aligned with what we tried to achieve, so I'm not 
considering that as a counter-proposal but as a refinement. :-)

> First, define the notion of dt "patterns" or "templates".  A dt
> pattern is to a dt node or subtree as a regex is to a string - it
> provides a reasonably expressive way of defining a family of dt
> nodes.  These would be defined in an extension / superset of dt
> syntax.

OK, make sense. Are you considering a syntax similar to xpath in order 
to match any node in the path, using the full path information instead 
of the individual node?

I'm a little bit rusty on xpath but AFAIR it could be something like that.

match any node containing reg:

"//reg/.."

match only node containing reg for ti,omap4-gpio compatible

"//*[@compatible='ti,omap4-gpio']/reg/.."

match only node containing reg below the ocp parent node

"//ocp/*/reg/.."

> A schema would then be defined as a set of implications:
> 	If node X matches pattern A, => it must also match pattern B
>
> For example:
> 	If a node has a compatible property with string "foodev"
> 	 => it must have various foodev properties.
>
> 	If a node has a "reg" property (at all)
> 	 => it must have the format required by reg
>
> 	If a node has an "interrupts" property
> 	 => it must have either "interrupt-parent" or "interrupt-map"
>

That's part is similar to what we had in mind. So we just need to find 
how to express it properly using the DTS syntax.

Thanks,
Benoit