[PATCH v2 00/10] Refine the locking for dev->iommu_group

Marek Szyprowski m.szyprowski at samsung.com
Tue Aug 8 06:00:30 PDT 2023

Hi All,

On 08.08.2023 14:32, Marek Szyprowski wrote:
> On 08.08.2023 12:31, Chen-Yu Tsai wrote:
>> On Mon, Aug 7, 2023 at 8:54 PM Joerg Roedel <joro at 8bytes.org> wrote:
>>> On Mon, Jul 31, 2023 at 02:50:23PM -0300, Jason Gunthorpe wrote:
>>>> Jason Gunthorpe (10):
>>>>    iommu: Remove useless group refcounting
>>>>    iommu: Add a lockdep assertion for remaining dev->iommu_group reads
>>>>    iommu: Add generic_single_device_group()
>>>>    iommu/sun50i: Convert to generic_single_device_group()
>>>>    iommu/sprd: Convert to generic_single_device_group()
>>>>    iommu/rockchip: Convert to generic_single_device_group()
>>>>    iommu/ipmmu-vmsa: Convert to generic_single_device_group()
>>>>    iommu/omap: Convert to generic_single_device_group()
>>>>    iommu: Complete the locking for dev->iommu_group
>>>>    iommu/intel: Fix missing locking for 
>>>> show_device_domain_translation()
>>>>   drivers/iommu/intel/debugfs.c  |  34 ++++----
>>>>   drivers/iommu/iommu.c          | 155 
>>>> +++++++++++++++++++++------------
>>>>   drivers/iommu/ipmmu-vmsa.c     |  22 ++---
>>>>   drivers/iommu/omap-iommu.c     |  30 +------
>>>>   drivers/iommu/omap-iommu.h     |   2 +-
>>>>   drivers/iommu/rockchip-iommu.c |  22 +----
>>>>   drivers/iommu/sprd-iommu.c     |  24 +----
>>>>   drivers/iommu/sun50i-iommu.c   |  29 ++----
>>>>   include/linux/iommu.h          |   3 +
>>>>   9 files changed, 138 insertions(+), 183 deletions(-)
>>> Applied, thanks for the nice cleanup!
>> This series seems to cause a hung task during boot on MediaTek 
>> platforms.
>> It hangs with next-20230808. Reverting the 10 commits from this series
>> makes the system boot up again.
> I confirm that next-20230808 is broken on ARM 32bit based Exynos 
> boards too. Boards lock up very early during boot. I will try to 
> investigate this soon.

Hmm this turned to be Exynos IOMMU specific, but the issue is probably 
somehow generic.

The deadlock happens early in __iommu_probe_device() on 
device_lock(dev). Here is a stack dump of that call:

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc5-next-20230808-dirty 
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __iommu_probe_device+0x3d8/0x4ac
  __iommu_probe_device from probe_iommu_group+0x8/0x14
  probe_iommu_group from bus_for_each_dev+0x60/0xb4
  bus_for_each_dev from bus_iommu_probe+0x34/0x118
  bus_iommu_probe from iommu_device_register+0x98/0x100
  iommu_device_register from exynos_sysmmu_probe+0x238/0x3c0
  exynos_sysmmu_probe from platform_probe+0x80/0xc0
  platform_probe from really_probe+0x154/0x3d4
  really_probe from __driver_probe_device+0xa0/0x1e8
  __driver_probe_device from driver_probe_device+0x30/0xd0
  driver_probe_device from __device_attach_driver+0xbc/0x11c
  __device_attach_driver from bus_for_each_drv+0x74/0xc0
  bus_for_each_drv from __device_attach+0xec/0x1b4
  __device_attach from bus_probe_device+0x8c/0x90
  bus_probe_device from device_add+0x5b8/0x78c
  device_add from of_platform_device_create_pdata+0x94/0xcc
  of_platform_device_create_pdata from of_platform_bus_create+0x1ac/0x4d8
  of_platform_bus_create from of_platform_bus_create+0x214/0x4d8
  of_platform_bus_create from of_platform_populate+0x80/0x114
  of_platform_populate from of_platform_default_populate_init+0xcc/0xe4
  of_platform_default_populate_init from do_one_initcall+0x6c/0x318
  do_one_initcall from kernel_init_freeable+0x1c4/0x214
  kernel_init_freeable from kernel_init+0x18/0x12c
  kernel_init from ret_from_fork+0x14/0x2c

The problem here is that exynos_sysmmu_probe() is by design called under 
device_lock, then it calls iommu_device_register(), which in turn 
triggers calling __iommu_probe_device() on all platform devices in the 
system, while the still probed sysmmu device is one of them.

Frankly speaking I have no idea how to defer calling 
iommu_device_register() to avoid this deadlock. Any ideas?

Best regards
Marek Szyprowski, PhD
Samsung R&D Institute Poland

More information about the Linux-rockchip mailing list