[PATCH v4 05/45] drm/connector: Check drm_connector_init pointers arguments

Wed Nov 29 03:40:38 PST 2023

On Wed, 29 Nov 2023, Maxime Ripard <mripard at kernel.org> wrote:
> On Wed, Nov 29, 2023 at 11:38:42AM +0200, Jani Nikula wrote:
>> On Wed, 29 Nov 2023, Maxime Ripard <mripard at kernel.org> wrote:
>> > Hi Ville,
>> >
>> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
>> >> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
>> >> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
>> >> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard at kernel.org> wrote:
>> >> > > > All the drm_connector_init variants take at least a pointer to the
>> >> > > > device, connector and hooks implementation.
>> >> > > >
>> >> > > > However, none of them check their value before dereferencing those
>> >> > > > pointers which can lead to a NULL-pointer dereference if the author
>> >> > > > isn't careful.
>> >> > > 
>> >> > > Arguably oopsing on the spot is preferrable when this can't be caused by
>> >> > > user input. It's always a mistake that should be caught early during
>> >> > > development.
>> >> > > 
>> >> > > Not everyone checks the return value of drm_connector_init and friends,
>> >> > > so those cases will lead to more mysterious bugs later. And probably
>> >> > > oopses as well.
>> >> > 
>> >> > So maybe we can do both then, with something like
>> >> > 
>> >> > if (WARN_ON(!dev))
>> >> >    return -EINVAL
>> >> > 
>> >> > if (drm_WARN_ON(dev, !connector || !funcs))
>> >> >    return -EINVAL;
>> >> > 
>> >> > I'd still like to check for this, so we can have proper testing, and we
>> >> > already check for those pointers in some places (like funcs in
>> >> > drm_connector_init), so if we don't cover everything we're inconsistent.
>> >> 
>> >> People will invariably cargo-cult this kind of stuff absolutely
>> >> everywhere and then all your functions will have tons of dead
>> >> code to check their arguments.
>> >
>> > And that's a bad thing because... ?
>> >
>> > Also, are you really saying that checking that your arguments make sense
>> > is cargo-cult?
>> 
>> It's a powerful thing to be able to assume a NULL argument is always a
>> fatal programming error on the caller's side, and should oops and get
>> caught immediately. It's an assertion.
>
> Yeah, but we're not really doing that either. We have no explicit
> assertion anywhere. We take a pointer in, and just hope that it will be
> dereferenced later on and that the kernel will crash. The pointer to the
> functions especially is only deferenced very later on.
>
> And assertions might be powerful, but being able to notice errors and
> debug them is too. A panic takes away basically any remote access to
> debug. If you don't have a console, you're done.
>
>> We're not talking about user input or anything like that here.
>> 
>> If you start checking for things that can't happen, and return errors
>> for them, you start gracefully handling things that don't have anything
>> graceful about them.
>
> But there's nothing graceful to do here: you just return from your probe
> function that you couldn't probe and that's it. Just like you do when
> you can't map your registers, or get your interrupt, or register into
> any framework (including drm_dev_register that pretty much every driver
> handles properly if it returns an error, without being graceful about
> it).

Those are all dynamic things that can fail.

Quite different from passing NULL dev, connector, or funcs to
drm_connector_init() and friends.

I think it's wrong to set the example that everything needs to be
checked, everything needs to return an error, every call needs to check
for error return, all the time, everywhere. People absolutely will cargo
cult that, and that's what Ville is referring to.

If you pass NULL dev, connector, or funcs to drm_connector_init() I
think you absolutely deserve to get an oops.

For dev, you could possibly not have reached the function with NULL
dev. (And __drm_connector_init() has dev->mode_config before the check,
so you'll get a static analyzer warning about dereference before the
check.) If you have NULL connector, you didn't check for allocation
failure earlier. If you have NULL funcs, you just passed NULL, because
it's generally supposed to be a pointer to a static const struct.

>> Having such checks in place trains people to think they *may* happen.
>
> In most cases, kmalloc can't fail. We seem to have a very different
> policy towards it.

Again, dynamic in nature and can fail.

>> While it should fail fast and loud at the developer's first smoke test,
>> and get fixed then and there.
>
> Returning an error + a warning also qualifies for "fail fast and loud".
> But keeps the system alive for someone to notice in any case.

But where do you draw the line? If we keep adding these checks to things
that actually can't happen, we teach developers we need to check for
impossible things. And we teach them not to trust anything.

I scroll down the file and reach
drm_connector_attach_edid_property(). Should we NULL check connector?
Should we change the function to int and return a value? Should the
caller check the value? Then there's drm_connector_attach_encoder(). And
drm_connector_has_possible_encoder(). And so on and so forth.

Where do you draw the line?

BR,
Jani.

-- 
Jani Nikula, Intel