PCIe probe failure on AmLogic A311D after 6.18-rc1
Linnaea Lavia
linnaea-von-lavia at live.com
Fri Oct 31 19:26:17 PDT 2025
On 11/1/2025 1:47 AM, Manivannan Sadhasivam wrote:
> On Fri, Oct 31, 2025 at 11:13:23AM -0500, Bjorn Helgaas wrote:
>> On Fri, Oct 31, 2025 at 08:26:42PM +0800, Linnaea Lavia wrote:
>>> On 10/31/2025 4:50 PM, Neil Armstrong wrote:
>>>> On 10/31/25 06:34, Linnaea Lavia wrote:
>>>>> On 10/30/2025 1:15 AM, Bjorn Helgaas wrote:
>>>>>> On Wed, Oct 29, 2025 at 06:50:46PM +0800, Linnaea Lavia wrote:
>>>>>>> On 10/29/2025 6:16 AM, Bjorn Helgaas wrote:
>>>>>>
>>>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>>>> index 214ed060ca1b..9cd12924b5cb 100644
>>>>>>>> --- a/drivers/pci/quirks.c
>>>>>>>> +++ b/drivers/pci/quirks.c
>>>>>>>> @@ -2524,6 +2524,7 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>>>>>>>> * disable both L0s and L1 for now to be safe.
>>>>>>>> */
>>>>>>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
>>>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SYNOPSYS, 0xabcd, quirk_disable_aspm_l0s_l1);
>>>>>>>> /*
>>>>>>>> * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
>>>>>>>
>>>>>>> I have applied the patch on 6.18-rc3 but it's still trying to enable ASPM for some reasons.
>>>>>>
>>>>>> Sorry, my fault, I should have made that fixup run earlier, so the
>>>>>> patch should be this instead:
>>>>>>
>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>> index 214ed060ca1b..4fc04015ca0c 100644
>>>>>> --- a/drivers/pci/quirks.c
>>>>>> +++ b/drivers/pci/quirks.c
>>>>>> @@ -2524,6 +2524,7 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>>>>>> * disable both L0s and L1 for now to be safe.
>>>>>> */
>>>>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
>>>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SYNOPSYS, 0xabcd, quirk_disable_aspm_l0s_l1);
>>>>>
>>>>> L1 still got enabled
>>
>> Is that based on the output below?
>>
>> [ 5.445853] [ T48] pci 0000:00:00.0: Disabling ASPM L0s/L1
>> [ 5.560448] [ T48] pci 0000:01:00.0: ASPM: default states L1
>>
>> If so, this doesn't necessarily mean L1 was enabled. It means the
>> quirk marked the 00:00.0 Root Port so we shouldn't ever enable L0s or
>> L1, and when we enumerated 01:00.0, we set its default ASPM state to
>> L1.
>>
>> But I don't *think* L1 should actually be enabled unless we can enable
>> it for both 00:00.0 and 01:00.0, and the quirk should mean that we
>> can't enable it for 00:00.0.
>>
>> This muddle of "capable" (per Link Capabilities) vs "disabled" (either
>> the Link Control shows disabled, or software said "don't ever use L1")
>> is part of what makes aspm.c so confusing.
>>
>>>>> The card works just fine. I'm thinking the ASPM issue is
>>>>> probably from the glue driver reporting the link to be down when
>>>>> it's really just in low power state.
>>>>
>>>> You're probably right, the meson_pcie_link_up() not only checks
>>>> the LTSSM but also the speed, which is probably wrong.
>>>>
>>>> Can you try removing the test for speed ?
>>>>
>>>> - if (smlh_up && rdlh_up && ltssm_up && speed_okay)
>>>> + if (smlh_up && rdlh_up && ltssm_up)
>>>>
>>>> The other drivers just checks the link, and some only the smlh_up
>>>> && rdlh_up. So you can also probably drop ltssm_up aswell.
>>>
>>> I can confirm that removing the check for ltssm_up and speed_okay
>>> made ASPM work.
>>
>> I don't think meson_pcie_link_up() should have the loop in it, so the
>> ltssm_up and speed_okay checks, the loop, the delay, and the timeout
>> message should probably all be removed. That method is supposed to be
>> a simple true/false check, and any waiting required should be done in
>> dw_pcie_wait_for_link().
>>
>> The link was clearly up when we discovered 01:00.0, so the "wait
>> linkup timeout" messages from meson_pcie_link_up() after that must be
>> from dw_pcie_link_up() being called via the .map_bus() call in
>> pci_generic_config_read() or pci_generic_config_write().
>>
>> When meson_pcie_link_up() returns false in those config accessors,
>> the config accesses will fail (they won't even be attempted), so we'll
>> see things like this:
>>
>> pci 0000:01:00.0: BAR 0: error updating (0xfc700004 != 0xffffffff)
>>
>> and "Unknown header type 7f" from lspci.
>>
>> Can you drop the ASPM quirk patch and instead try the
>> meson_pcie_link_up() patch below on top of v6.18-rc3?
>>
>>> We still need a solution to the original issue that's preventing the
>>> controller from being initialized.
>>>
>>> My kernel has the following patch applied, but I think it's not
>>> suitable for upstream as this changes device tree bindings for PCIe
>>> controller on meson.
>>
>> I assume the original issue is this:
>>
>> meson-pcie fc000000.pcie: error -EBUSY: can't request region for resource [mem 0xfc000000-0xfc3fffff]
>>
>> and you confirmed that it wasn't fixed by a1978b692a39 ("PCI: dwc: Use
>> custom pci_ops for root bus DBI vs ECAM config access"), which
>> appeared in v6.18-rc3?
>>
>> If it's still broken in v6.18-rc3, and the dtsi and
>> meson_pcie_get_mems() patch below makes it work, we have more work to
>> do, and maybe Krishna has some ideas.
>>
>
> We have two issues on this platform:
>
> 1. DT represents 'dbi' region as 'elbi', which was wrong as both are different
> address spaces. ELBI is an optional region, whereas DBI has Root Port and
> controller configuration registers, which is mandatory.
>
> 2. Driver parses/maps the 'elbi' region and stores it in 'pci->dbi_base'. So
> this made sure that the code depending on the 'pci->dbi_base' work as expected.
>
> Commit c96992a24bec, moved the ELBI parsing logic to the core code, and it also
> removed the custom parsing from glue drivers. But since this driver was using
> 'pci->dbi_base' instead of 'pci->elbi_base', it was not caught during the move.
>
> I've submitted a series [1] that hopefully would fix this resource parsing
> issue. Please test it out.
>
> - Mani
>
> [1] https://lore.kernel.org/linux-pci/20251031-pci-meson-fix-v1-0-ed29ee5b54f9@oss.qualcomm.com
>
Applied on vanilla 6.18-rc3 with Bjorn's patch, PCIe now works out of the box on the board.
More information about the linux-amlogic
mailing list