[v6,3/4] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

Mon Dec 28 10:12:41 EST 2020

On Mon, 28 Dec 2020 12:01:57 +0000,
Jianjun Wang <jianjun.wang at mediatek.com> wrote:
> 
> On Fri, 2020-12-25 at 19:22 +0000, Marc Zyngier wrote:

Dropped	<Project_Global_Chrome_Upstream_Group at mediatek.com>, as it
bounces:

<Project_Global_Chrome_Upstream_Group at mediatek.com>: host
    mailgw01.mediatek.com[216.200.240.184] said: 550 Relaying mail to
    project_global_chrome_upstream_group at mediatek.com is not allowed (in reply
    to RCPT TO command)

Please make sure you don't Cc email addresses that cannot accept email
form the outside world.

[...]

> > > +/**
> > > + * struct mtk_pcie_msi - MSI information for each set
> > > + * @base: IO mapped register base
> > > + * @irq: MSI set Interrupt number
> > > + * @index: MSI set number
> > > + * @msg_addr: MSI message address
> > > + * @domain: IRQ domain
> > > + */
> > > +struct mtk_pcie_msi {
> > > +	void __iomem *base;
> > > +	unsigned int irq;
> > > +	int index;
> > > +	phys_addr_t msg_addr;
> > > +	struct irq_domain *domain;
> > > +};
> > 
> > This looks odd. You seem to say that this covers a set if MSIs, and
> > yet the irq field here clearly isn't part of a set. Is that per MSI
> > instead? Either way, something is not quite as it should be.
> > 
> 
> Appreciate all these comments, please allow me to explain the MSI
> interrupt design in this HW.
> 
> The HW design of MSI interrupts will be like the following:
> 
>                   +-----+
>                   | GIC |
>                   +-----+
>                      ^
>                      |
>                      |[port->irq]
>                      |
>              +-+-+-+-+-+-+-+-+
>              |0|1|2|3|4|5|6|7|[PCIe intc]
>              +-+-+-+-+-+-+-+-+
>               ^ ^           ^
>               | |    ...    |[msi_info->irq]
>       +-------+ +------+    +-----------+
>       |                |                |
> +-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
> |0|1|...|30|31|  |0|1|...|30|31|  |0|1|...|30|31|[MSI sets]
> +-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
>  ^ ^      ^  ^    ^ ^      ^  ^    ^ ^      ^  ^
>  | |      |  |    | |      |  |    | |      |  | [MSI irq]
>  | |      |  |    | |      |  |    | |      |  |
> 
>   (MSI SET0)       (MSI SET1)  ...   (MSI SET7)
> 

Thanks, that's really helpful. You should consider adding this to the
driver code, as part of the documentation.

> In software parts, the port->msi_top_domain is created to maintains 8
> MSI IRQs from the PCIe intc layer, its hardware IRQ will be mapped to
> msi_info->irq by irq_create_mapping.
> 
> The port->msi_domain contains 256 MSI IRQs in total, it consist of 8 MSI
> sets, and each MSI set contains 32 MSI IRQs.
> 
> The structure of mtk_pcie_msi is used to describe the MSI set, I think
> it will be more convenient and comply with the HW design when use this
> structure, we can get the information of MSI set directly, instead of
> calculated by port->base.
> 
> When a MSI interrupt is received, the interrupt handle flow will be like
> the following:
> 
>               mtk_pcie_irq_handler (port->irq)
>                         |
>                         |(find mapping in msi_top_domain)
>                         |
>                         v
>                 mtk_pcie_msi_handler (msi_info->irq)
>                         |
>                         |(find mapping in msi_domain)
>                         |
>                         v
>                  handle_edge_irq  (MSI irq)
>                         |
>                         |
>                         v
>             (dispatch to device handler)
> 
> Yes, I had to admit that it's not a quite good solution of irqdomains,
> since the local irq domain is partial coupled with the standard PCI MSI
> irqdomain.
> 
> Should I need to create another irqdomain to maintain the MSI sets
> layer, and set it as the parent domain of PCI MSI domain? I really need
> your suggestions, thanks a lot.

<rant>
My first suggestion would be to go back to whoever put this HW block
together and make sure they never write a line of RTL ever again.
Programming using copy/paste is bad enough, but doing the same thing
with HW is just terrible.
</rant>

Ideally, you would model the whole thing as 8 distinct MSI
controllers, as that is effectively what they are. Evidently, this
would cause all kind of issues when associating PCI devices with their
MSI domain, so let's not do that.

So what we want is to have a single MSI domain (with 256 entries,
hence covering all sets), and compute the hwirq from the level below,
depending on the interrupt PCIe interrupt, and the index of the MSI in
the status register for this PCIe interrupt. I think you have most of
the information already, you just need to simplify the way you keep
track of it, and maybe come up with better names for the various
blocks.

I expect you would need something like this:

struct mtk_msi_set {
	void __iomem		*base;
	phys_addr_t		msg_addr;
};

struct mtk_pcie_port {
[...]
	int 			base_irq;
	struct mtk_msi_set	msi_sets[8];
[...]
};

The assumption is that you can build the bottom MSI domain hwirq as:

    hwirq = set_idx * 32 + set_status_bit;

From that, you can directly go back from a hwirq to a set, and index
the msi_sets[] array to find the relevant information.

Another assumption is that all 8 cascade interrupts are contiguous
(which is pretty easy to guarantee), meaning that you can directly use
the PCIe interrupt to compute the set number.

With all that, you can directly keep a pointer from the bottom MSI
domain irq_data structures to the mtk_pcie_port structure, providing
you with all the information you need.

Once you will have moved the various bits at the right place, and used
the established MSI abstractions, things should just work.

[...]

> > > +static void mtk_intx_eoi(struct irq_data *data)
> > > +{
> > > +	struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data);
> > > +	unsigned long hwirq;
> > > +
> > > +	/**
> > > +	 * As an emulated level IRQ, its interrupt status will remain
> > > +	 * until the corresponding de-assert message is received; hence that
> > > +	 * the status can only be cleared when the interrupt has been serviced.
> > > +	 */
> > > +	hwirq = data->hwirq + PCIE_INTX_SHIFT;
> > > +	writel(BIT(hwirq), port->base + PCIE_INT_STATUS_REG);
> > 
> > All of this is the description of a level interrupt, so this is pretty
> > much devoid of any information as to *why* you need to write to clear
> > this bit. What happens if the interrupt is still asserted because
> > nothing has handled it? Without any further information, this looks
> > terribly wrong.
> 
> Sorry, this comment should be used to describe the mtk_intx_eoi
> function, it cause misunderstandings at this place. I just want to add
> the comment to explain that why this interrupt needs to be acked at the
> end of interrupt. I will move it to the front of mtk_intx_eoi in the
> next version.

Is it an ack or an eoi? These are two different things, and really
depends on your HW. Please find out which one it is, and we will work
out a way to handle it.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.