snd-cmipci oops during probe on arm64 (current mainline, pre-6.6-rc1)

Robin Murphy robin.murphy at arm.com
Wed Sep 6 05:49:16 PDT 2023


On 2023-09-06 07:10, Takashi Iwai wrote:
> On Wed, 06 Sep 2023 00:01:01 +0200,
> Antonio Terceiro wrote:
>>
>> Hi,
>>
>> I'm using an arm64 workstation, and wanted to add a sound card to it. I bought
>> one who was pretty popular around where I live, and it is supported by the
>> snd-cmipci driver.
>>
>> It's this one:
>>
>> 0005:02:00.0 Multimedia audio controller: C-Media Electronics Inc CMI8738/CMI8768 PCI Audio (rev 10)
>>
>> After building a mailine kernel (post-v6.5, pre-rc1) on Debian testing arm64
>> with localmodconfig + CONFIG_SND_CMIPCI=m, it crashes with "Unable to handle
>> kernel paging request at virtual address fffffbfffe80000c", and the system
>> never finishes to boot. The login manager never shows up and the serial console
>> never gets to a login prompt. I observed the same issue on a 6.3 Debian kernel,
>> after rebuilding with CONFIG_SND_CMIPCI=m.
>>
>> If I stop the module from being automatically loaded by adding
>> `blacklist snd-cmipci` to /etc/modprobe.d/snd-cmipci.conf (or if I
>> remove the card from the PCIe slot), I get the system to boot. But tring
>> to load the module manually causes the same crash (I only tested this
>> with the card on):
>>
>> [  +4,501093] snd_cmipci 0005:02:00.0: stream 512 already in tree
>> [  +0,000155] Unable to handle kernel paging request at virtual address fffffbfffe80000c
>> [  +0,007927] Mem abort info:
>> [  +0,002793]   ESR = 0x0000000096000006
>> [  +0,003743]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [  +0,005307]   SET = 0, FnV = 0
>> [  +0,003049]   EA = 0, S1PTW = 0
>> [  +0,003134]   FSC = 0x06: level 2 translation fault
>> [  +0,004872] Data abort info:
>> [  +0,002873]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>> [  +0,005479]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [  +0,005047]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [  +0,000003] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080519fe9000
>> [  +0,000004] [fffffbfffe80000c] pgd=000008051a979003, p4d=000008051a979003, pud=000008051a97a003, pmd=0000000000000000
>> [  +0,000009] Internal error: Oops: 0000000096000006 [#1] SMP
>> [  +0,028142] Modules linked in: snd_cmipci(+) snd_mpu401_uart snd_opl3_lib xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops uvc videobuf2_v4l2 videodev videobuf2_common snd_seq_dummy snd_hrtimer snd_seq qrtr rfkill overlay ftdi_sio usbserial snd_usb_audio snd_usbmidi_lib snd_pcm aes_ce_blk aes_ce_cipher snd_hwdep polyval_ce snd_rawmidi polyval_generic snd_seq_device joydev snd_timer ghash_ce hid_generic gf128mul snd usbhid sha2_ce ipmi_ssif soundcore hid mc sha256_arm64 ipmi_devintf arm_spe_pmu ipmi_msghandler sha1_ce sbsa_gwdt binfmt_misc nls_ascii nls_cp437 vfat fat xgene_hwmon cppc_cpufreq arm_cmn arm_dsu_pmu evdev nfsd auth_rpcgss nfs_acl lockd grace dm_mod fuse loop efi_pstore dax sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid
>   456 async_raid6_recov async_memcpy
>> [  +0,000142]  async_pq async_xor async_tx libcrc32c crc32c_generic xor xor_neon raid6_pq raid1 raid0 multipath linear md_mod nvme nvme_core ast t10_pi drm_shmem_helper xhci_pci drm_kms_helper xhci_hcd crc64_rocksoft crc64 drm crc_t10dif usbcore crct10dif_generic igb crct10dif_ce crct10dif_common usb_common i2c_algo_bit i2c_designware_platform i2c_designware_core
>> [  +0,121670] CPU: 0 PID: 442 Comm: kworker/0:4 Not tainted 6.5.0+ #2
>> [  +0,006259] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
>> [  +0,012506] Workqueue: events work_for_cpu_fn
>> [  +0,004353] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  +0,006953] pc : logic_inl+0xa0/0xd8
>> [  +0,003570] lr : snd_cmipci_probe+0x7a4/0x1140 [snd_cmipci]
>> [  +0,005578] sp : ffff80008287bc70
>> [  +0,003303] x29: ffff80008287bc70 x28: ffff08008af9d6a0 x27: 0000000000000000
>> [  +0,007128] x26: ffffc4818263c228 x25: 0000000000000000 x24: 0000000000000001
>> [  +0,007127] x23: ffff07ff81a9e000 x22: ffff07ff81a9e0c0 x21: ffff08008af9d080
>> [  +0,007127] x20: ffffc4818263c000 x19: 0000000000000000 x18: ffffffffffffffff
>> [  +0,007127] x17: 0000000000000000 x16: ffffc4819ac3cd38 x15: ffff80008287ba80
>> [  +0,007127] x14: 0000000000000001 x13: ffff80008287bbc4 x12: 0000000000000000
>> [  +0,007126] x11: ffff07ff834616d0 x10: ffffffffffffffc0 x9 : ffffc4819a61dd18
>> [  +0,007127] x8 : 0000000000000228 x7 : 0000000000000001 x6 : 00000000000000ff
>> [  +0,007127] x5 : ffffc4819adb7998 x4 : 0000000000000000 x3 : 00000000000000ff
>> [  +0,007127] x2 : 0000000000ffbffe x1 : 000000000000000c x0 : fffffbfffe80000c
>> [  +0,007126] Call trace:
>> [  +0,002436]  logic_inl+0xa0/0xd8
>> [  +0,003221]  local_pci_probe+0x48/0xb8
>> [  +0,003744]  work_for_cpu_fn+0x24/0x40
>> [  +0,003741]  process_one_work+0x170/0x3a8
>> [  +0,004002]  worker_thread+0x23c/0x460
>> [  +0,003742]  kthread+0xe8/0xf8
>> [  +0,003047]  ret_from_fork+0x10/0x20
>> [  +0,003569] Code: d2bfd000 f2df7fe0 f2ffffe0 8b000020 (b9400000)
>> [  +0,006083] ---[ end trace 0000000000000000 ]---
>>
>> Because this sound card chipset seems to be popular (pretty much all PCI cards
>> I can find to buy locally use that), I'm thinking this might be specific to
>> arm64, otherwise someone would have seen this before.
> 
> There is only one change in this driver code itself since 6.5 (commit
> b6ba0aa46138), and judging from the stack trace, it's unrelated with
> your problem.   It's more likely a regression in the lower level code,
> e.g. PCI layer or arch/arm64 stuff.
> 
> Could you try git bisect?

Hmm, but has this combination of card and machine *ever* actually worked?

It's blowing up trying to access PCI I/O space, which has apparently 
ended up in the indirect access mechanism without that being configured 
correctly. That is definitely an issue down somewhere between the PCI 
layer and the system firmware. Does the system even have an I/O space 
window? Some arm64 machines don't. I guess we might not have got as far 
as probing a driver if the I/O BAR couldn't be assigned at all, but 
either way something's not gone right.

Thanks,
Robin.


More information about the linux-arm-kernel mailing list