[PATCH V5 3/3] PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver

Wed Jul 19 23:37:03 PDT 2023

Hi Bjorn,

> -----Original Message-----
> From: Bjorn Helgaas <helgaas at kernel.org>
> Sent: Saturday, July 1, 2023 4:49 AM
> To: Havalige, Thippeswamy <thippeswamy.havalige at amd.com>
> Cc: krzysztof.kozlowski at linaro.org; devicetree at vger.kernel.org; linux-
> pci at vger.kernel.org; linux-kernel at vger.kernel.org; robh+dt at kernel.org;
> bhelgaas at google.com; lorenzo.pieralisi at arm.com; linux-arm-
> kernel at lists.infradead.org; Gogada, Bharat Kumar
> <bharat.kumar.gogada at amd.com>; Simek, Michal
> <michal.simek at amd.com>
> Subject: Re: [PATCH V5 3/3] PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver
> 
> On Wed, Jun 28, 2023 at 02:58:12PM +0530, Thippeswamy Havalige wrote:
> > Add support for Xilinx XDMA Soft IP core as Root Port.
> > ...
> 
> > |Reported-by: kernel test robot <lkp at intel.com>
> > |Reported-by: Dan Carpenter <error27 at gmail.com>
> > |Closes:
> > |https://lore.kernel.org/r/202305261250.2cs1phTS-lkp@intel.com/
> 
> Remove these.  I mentioned this before:
> https://lore.kernel.org/r/ZHd/7AaLaGyr1jNA@bhelgaas
- Agreed, I'll remove this in next patch
> > + * struct pl_dma_pcie - PCIe port information
> > + * @dev: Device pointer
> > + * @reg_base: IO Mapped Register Base
> > + * @irq: Interrupt number
> > + * @cfg: Holds mappings of config space window
> > + * @phys_reg_base: Physical address of reg base
> > + * @intx_domain: Legacy IRQ domain pointer
> > + * @pldma_domain: PL DMA IRQ domain pointer
> > + * @resources: Bus Resources
> > + * @msi: MSI information
> > + * @irq_misc: Legacy and error interrupt number
> > + * @intx_irq: legacy interrupt number
> > + * @lock: lock protecting shared register access
> 
> Capitalize the intx_irq and lock descriptions so they match the others.
- Agreed, I'll fix it in the next patch
> "Legacy and error interrupt number" and "legacy interrupt number"
> sound like they overlap -- "legacy interrupt number" is part of both.
> Is that an error?
- Agreed, I'll modify this comment to legacy interrupt number. (This irq line is for both legacy interrupts and error interrupt bits)
> > +static bool xilinx_pl_dma_pcie_valid_device(struct pci_bus *bus,
> > +unsigned int devfn) {
> > +	struct pl_dma_pcie *port = bus->sysdata;
> > +
> > +	/* Check if link is up when trying to access downstream ports */
> > +	if (!pci_is_root_bus(bus)) {
> > +		/*
> > +		 * If the link goes down after we check for link-up, we have a
> problem:
> > +		 * if a PIO request is initiated while link-down, the whole
> controller
> > +		 * hangs, and even after link comes up again, previous PIO
> requests
> > +		 * won't work, and a reset of the whole PCIe controller is
> needed.
> > +		 * Henceforth we need link-up check here to avoid sending
> PIO request
> > +		 * when link is down.
> 
> Wrap this comment so it fits in 80 columns like the rest of the file.
> 
> I think the comment was added because I pointed out that this is racy.
> Obviously the comment doesn't *fix* the race, and it actually doesn't even
> describe the race.
- Agreed, I'll add comments regarding race condition.
> Even with the xilinx_pl_dma_pcie_link_up() check, this is racy because
> xilinx_pl_dma_pcie_link_up() may tell you the link is up, but the link may go
> down before the driver attempts the config transaction.  THAT is the race.
> 
> If the controller hangs in that situation, that's a hardware defect, and from
> your comment, it sounds like it's unrecoverable.
> 
> > +		 */
> > +		if (!xilinx_pl_dma_pcie_link_up(port))
> > +			return false;
> 
> > +static int xilinx_pl_dma_pcie_intx_map(struct irq_domain *domain,
> unsigned int irq,
> > +				       irq_hw_number_t hwirq)
> 
> Wrap to fit in 80 columns like the rest of the file.
> 
> > +static struct irq_chip xilinx_msi_irq_chip = {
> > +	.name = "pl_dma_pciepcie:msi",
> 
> Why does this name have two copies of "pcie" in it?  This driver has four
> irq_chip structs; maybe the names could be more similar?
- Agreed, I'll modify all irq_chip names this in next patch 
Example: 
static struct irq_chip xilinx_msi_irq_chip = {
	.name = "pl_dma:PCIe MSI",

>   xilinx_leg_irq_chip			INTx
>   xilinx_msi_irq_chip		 	pl_dma_pciepcie:msi
>   xilinx_irq_chip			Xilinx MSI
>   xilinx_pl_dma_pcie_event_irq_chip	RC-Event
> 
> > +	/* Plug the INTx chained handler */
> > +	irq_set_chained_handler_and_data(port->intx_irq,
> > +					 xilinx_pl_dma_pcie_intx_flow, port);
> > +
> > +	/* Plug the main event chained handler */
> > +	irq_set_chained_handler_and_data(port->irq,
> > +					 xilinx_pl_dma_pcie_event_flow,
> port);
> 
> What's the reason for using chained IRQs?  Can this be done without them?  I
> don't claim to understand all the issues here, but it seems better to avoid
> chained IRQ handlers when possible:
- As per the comments in this https://lkml.kernel.org/lkml/alpine.DEB.2.20.1705232307330.2409@nanos/T/
"It is fine to have chained interrupts when bootloader, device tree and kernel under control. Only if BIOS/UEFI comes into
play the user is helpless against interrupt storm which will cause system to hangs."

We are using ARM embedded platform with Bootloader, Devicetree flow.

> https://lore.kernel.org/all/877csohcll.ffs@tglx/
> 
> > +	/*set the Bridge enable bit */
- Agreed, I ll modify it in next patch.
> Space before "Set".  I mentioned this before at
> https://lore.kernel.org/r/ZHd/7AaLaGyr1jNA@bhelgaas
> 
> > +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +	if (!res) {
> > +		dev_err(dev, "missing \"reg\" property\n");
> 
> All your other error messages are capitalized.  Make this one match.
> 
> > +	bridge->ops = (struct pci_ops *)&xilinx_pl_dma_pcie_ops.pci_ops;
> 
> I don't think this cast is needed.
-Agreed, will modify it in next patch.
> 
> Bjorn