NULL pointer exception in drivers/memory/mtk-smi.c
u.kleine-koenig at baylibre.com
u.kleine-koenig at baylibre.com
Mon Jul 7 06:41:00 PDT 2025
On Mon, Jul 07, 2025 at 10:47:53AM +0200, AngeloGioacchino Del Regno wrote:
> Il 07/07/25 09:24, Yong Wu (吴勇) ha scritto:
> > On Thu, 2025-07-03 at 18:41 +0200, Uwe Kleine-König wrote:
> > > Hello,
> > >
> > > [expanding the audience a bit according to the drivers that are
> > > involved
> > > now that the problem is better understood]
> > >
> > > On Tue, Jun 17, 2025 at 05:18:30PM +0200, Uwe Kleine-König wrote:
> > > > on a 6.16-rc2 kernel running on an mt8365-evk I occasionally hit
> > > > the
> > > > following during boot:
> > >
> > > I invested some time now to understand the issue. So here comes what
> > > I
> > > understood, maybe that helps someone to spot the fix to the described
> > > problem.
> > >
> > > With a configuration that has all drivers built-in but
> > >
> > > CONFIG_DRM_MEDIATEK=m
> > > CONFIG_MTK_IOMMU=m
> > >
> > > and
> > >
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index cb95fecf6016..d4320db7cd2d 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -1387,14 +1387,15 @@ static int mtk_iommu_probe(struct
> > > platform_device *pdev)
> > > goto out_list_del;
> > > ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
> > > if (ret)
> > > goto out_sysfs_remove;
> > > if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
> > > + msleep(10000);
> > > ret = component_master_add_with_match(dev,
> > > &mtk_iommu_com_ops, match);
> > > if (ret)
> > > goto out_device_unregister;
> > > }
> > > return ret;
> > > out_device_unregister:
> > >
> > > I can reliably trigger the race.
> > >
> > > With that sleep in place iommu_device_register() completes quickly
> > > which
> > > enables probing of the devices with drivers contained in the
> > > drm_mediatek module (because the modules are loaded in parallel on
> > > a different CPU).
> > >
> > > Then generic driver code calls resume on all suppliers for devices to
> > > bind, among them the four larb devices. However due to
> > > component_master_add_with_match() not being called yet, the larb
> > > devices
> > > are not yet bound to the iommu device and so larb->mmu is still NULL.
> > > The latter is a problem in mtk_smi_larb_config_port_gen2_general()
> > > which
> > > is called from mtk_smi_larb_resume().
> >
> > Hi Uwe,
> >
> > Thanks for your help. In this case, it looks like the disp probe occurs
> > before the smi_larb_bind operation, we need to let disp wait for the
> > bind to complete.
> >
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -666,6 +666,10 @@ static int __maybe_unused
> > mtk_smi_larb_resume(struct device *dev)
> > if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
> > MTK_SMI_FLAG_SLEEP_CTL))
> > mtk_smi_larb_sleep_ctrl_disable(larb);
> >
> > + /* The larb_bind operation may be later than the master probe. */
> > + if (!larb->mmu)
> > + return -EPROBE_DEFER;
> > +
A .resume callback isn't supposed to return -EPROBE_DEFER.
> > /* Configure the basic setting for this larb */
> >
> >
> > Hi Angelo,
> > Do you have any suggestion?
> >
>
> I don't have the necessary bandwidth to really broadly research about this, but
> there's two ideas:
>
> 1. What if we disable PM OPS until MMU is bound? That would avoid calls to
> larb_resume().....
I think the key question here is: Is the iommu functional without the
larbs bound? If not I'd claim that iommu_device_register() should only
happen if all larbs are present and ready.
> 2. Instead of component, we could register a device - which carries its own suspend
> and resume ops; the device would be registered only after all local arbiters
> are bound, so there would be no suspend/resume ops until everything is in place.
> That's more or less how it's done in drivers/usb/core/port.c
>
> Probably N.1 would work best - we could...
>
> probe:
> pm_runtime_set_active(larb->smi.dev);
> pm_runtime_forbid(larb->smi.dev);
>
> bind:
> larb->mmu = ...
> larb->bank = ...
>
> pm_runtime_set_suspended(larb->smi.dev);
> pm_runtime_allow(larb->smi.dev);
>
>
> ...the next usage/wakeup should then call resume() but at that point we're sure
> that larb->mmu is actually initialized, because we allowed PM resume only after
> binding.
Not knowing the details of the hardware that feels wrong. IMHO a device
that is operational should be able to do PM ops.
Best regards
Uwe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-mediatek/attachments/20250707/ad9f5ed9/attachment.sig>
More information about the Linux-mediatek
mailing list