NULL pointer exception in drivers/memory/mtk-smi.c

AngeloGioacchino Del Regno angelogioacchino.delregno at collabora.com
Mon Jul 7 01:47:53 PDT 2025


Il 07/07/25 09:24, Yong Wu (吴勇) ha scritto:
> On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
>> Hello,
>>
>> [expanding the audience a bit according to the drivers that are
>> involved
>> now that the problem is better understood]
>>
>> On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
>>> on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
>>> the
>>> following during boot:
>>
>> I invested some time now to understand the issue. So here comes what
>> I
>> understood, maybe that helps someone to spot the fix to the described
>> problem.
>>
>> With a configuration that has all drivers built-in but
>>
>> 	CONFIG_DRM_MEDIATEK=m
>> 	CONFIG_MTK_IOMMU=m
>>
>> and
>>
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index cb95fecf6016..d4320db7cd2d 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
>> platform_device *pdev)
>>   		goto out_list_del;
>>   
>>   	ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
>>   	if (ret)
>>   		goto out_sysfs_remove;
>>   
>>   	if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
>> +		msleep(10000);
>>   		ret = component_master_add_with_match(dev,
>> &mtk_iommu_com_ops, match);
>>   		if (ret)
>>   			goto out_device_unregister;
>>   	}
>>   	return ret;
>>   
>>   out_device_unregister:
>>
>> I can reliably trigger the race.
>>
>> With that sleep in place iommu_device_register() completes quickly
>> which
>> enables probing of the devices with drivers contained in the
>> drm_mediatek module (because the modules are loaded in parallel on
>> a different CPU).
>>
>> Then generic driver code calls resume on all suppliers for devices to
>> bind, among them the four larb devices. However due to
>> component_master_add_with_match() not being called yet, the larb
>> devices
>> are not yet bound to the iommu device and so larb->mmu is still NULL.
>> The latter is a problem in mtk_smi_larb_config_port_gen2_general()
>> which
>> is called from mtk_smi_larb_resume().
> 
> Hi Uwe,
> 
> Thanks for your help. In this case, it looks like the disp probe occurs
> before the smi_larb_bind operation, we need to let disp wait for the
> bind to complete.
> 
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -666,6 +666,10 @@ static int __maybe_unused
> mtk_smi_larb_resume(struct device *dev)
>          if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
> MTK_SMI_FLAG_SLEEP_CTL))
>                  mtk_smi_larb_sleep_ctrl_disable(larb);
> 
> +     /* The larb_bind operation may be later than the master probe. */
> +       if (!larb->mmu)
> +               return -EPROBE_DEFER;
> +
>          /* Configure the basic setting for this larb */
> 
> 
> Hi Angelo,
>   Do you have any suggestion?
> 

I don't have the necessary bandwidth to really broadly research about this, but
there's two ideas:

1. What if we disable PM OPS until MMU is bound? That would avoid calls to
    larb_resume().....
2. Instead of component, we could register a device - which carries its own suspend
    and resume ops; the device would be registered only after all local arbiters
    are bound, so there would be no suspend/resume ops until everything is in place.
    That's more or less how it's done in drivers/usb/core/port.c

Probably N.1 would work best - we could...

probe:
pm_runtime_set_active(larb->smi.dev);
pm_runtime_forbid(larb->smi.dev);

bind:
larb->mmu = ...
larb->bank = ...

pm_runtime_set_suspended(larb->smi.dev);
pm_runtime_allow(larb->smi.dev);


...the next usage/wakeup should then call resume() but at that point we're sure
that larb->mmu is actually initialized, because we allowed PM resume only after
binding.

I want to repeat - I didn't make any extensive research, just a fast read and
threw some ideas on the table without any testing or anything else :-)

Cheers,
Angelo

> 
> Thanks.
> 
>>
>> I don't know what the right fix here is, but maybe someone else does?
>>
>> Best regards
>> Uwe




More information about the Linux-mediatek mailing list