TDA998x crash on HDLCD probe failure

Robin Murphy robin.murphy at arm.com
Thu Nov 24 05:49:25 PST 2016


On 24/11/16 13:29, Russell King - ARM Linux wrote:
> On Thu, Nov 24, 2016 at 01:18:39PM +0000, Robin Murphy wrote:
>> Hi Liviu, Russell,
>>
>> I'd been meaning to try digging into this if it hadn't gone away since I
>> first noticed it, but I don't really have the time and it still happens
>> with 4.9-rc and today's -next. Representative splat below, but in
>> summary what happens is that if the HDLCD fails to probe, the TDA998x
>> connector seems to get cleaned up twice, resulting in a NULL dereference
>> the second time. I got as far as sketching out the following flow from a
>> debug session (on the same 4.8-rc2 kernel), but I don't know nearly
>> enough to tell which driver is at fault:
>>
>> hdlcd_drm_bind
>> -> drm_fbdev_cma_init (fails)
>> ...
>> -> drm_mode_config_cleanup
>>    ...
>>    -> drm_connector_cleanup
>> -> component_unbind_all
>>    ...
>>    -> tda998x_unbind
>>       -> drm_connector_cleanup (NULL connector)
>>
>> It's easily reproduced on Juno by booting arm64 defconfig with
>> CONFIG_CMA_SIZE_MBYTES=1 and a sufficiently large monitor connected to
>> warrant a >1MB framebuffer.
> 
> It looks to me like a hdlcd bug.
> 
> The probe path operates in this order:
> 
> - allocates hdlcd - 1
> - allocates drm device - 2
> - drm_mode_config_init - 3
> - hdlcd_load - 4
> - binds all components - 5
> - enables runtime PM - 6
> - drm_vblank_init - 7
> - drm_mode_config_reset - 8
> - drm_kms_helper_poll_init - 9
> - drm_fbdev_cma_init - 10
> - drm_dev_register - 11
> 
> However, the cleanup operates in this order:
> - drm_fbdev_cma_fini - undoes 10
> - drm_kms_helper_poll_fini - undoes 9
> - drm_mode_config_cleanup - undoes 3
> - drm_vblank_cleanup - undoes 7
> - pm_runtime_disable - undoes 6
> - component_unbind_all - undoes 5
> - drm_irq_uninstall - undoes 4
> - of_reserved_mem_device_release - undoes other half of 4
> - drm_dev_unref - undoes 2
> 
> Spot the step which is out of the correct order - drm_mode_config_cleanup()
> is misplaced - it's reversing the actions of drm_mode_config_init(), not
> drm_mode_config_reset().

Thanks for the explanation - that saves at least a day's worth of me
trying to understand DRM code :)

> So, drm_mode_config_cleanup() should be much later, after step 4 has
> been undone, otherwise there are paths that leave various DRM objects
> (created by drm_mode_create_standard_properties()) referenced, and
> will cause problems exactly like you're seeing here.

Liviu, can I leave this with you then?

Robin.



More information about the linux-arm-kernel mailing list