[PATCH v3 18/18] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
Steve Wise
swise at opengridcomputing.com
Sat Mar 17 06:23:13 PDT 2018
>
> On 3/17/2018 12:03 AM, Sinan Kaya wrote:
> > On 3/16/2018 11:40 PM, Sinan Kaya wrote:
> >> I'll change writel_relaxed() with __raw_writel() in the series like you
> suggested
> >> and also look at your other comments.
> >
> > I spoke too soon.
> >
> > Now that I realized, code needs to follow one of the following patterns
> for correctness
> >
> > 1)
> > wmb()
> > writel()/writel_relaxed()
> >
> > or
> >
> > 2)
> > wmb()
> > __raw_wrltel()
> > mmiowb()
> >
> > but definitely not
> >
> > wmb()
> > __raw_wrltel()
> >
> > Since #1 == #2, I'll stick to my current implementation of writel_relaxed()
> >
> > Changing writel() to writel_relaxed() or __raw_writel() isn't enough.
> PowerPC needs mmiowb()
> > for correctness. ARM's mmiowb() implementation is empty.
> >
> > So, there is no one size fits all solution with the current state of affairs.
> >
> >
>
> I think I finally got what you mean.
>
> Code seems to have
>
> wmb()
> writel()/writeq()
> wmb()
>
> this can be safely replaced with
>
> wmb()
> __raw_writel()/__raw_writeq()
> wmb()
>
> This will work on all arches. Below is the new version. Let me know if this is
> OK.
>
> +++ b/drivers/infiniband/hw/cxgb4/t4.h
> @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64
> *src)
> int count = 8;
>
> while (count) {
> - writeq(*src, dst);
> + __raw_writeq(*src, dst);
> src++;
> dst++;
> count--;
> @@ -477,15 +477,16 @@ static inline void t4_ring_sq_db(struct t4_wq *wq,
> u16 inc, union t4_wr *wqe)
> (u64 *)wqe);
> } else {
> pr_debug("DB wq->sq.pidx = %d\n", wq->sq.pidx);
> - writel(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
> - wq->sq.bar2_va + SGE_UDB_KDOORBELL);
> + __raw_writel(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
> + wq->sq.bar2_va + SGE_UDB_KDOORBELL);
> }
>
> /* Flush user doorbell area writes. */
> wmb();
> return;
> }
> - writel(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
> + __raw_writel(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
> + mmiowmb()
> }
>
> static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc,
> @@ -502,15 +503,16 @@ static inline void t4_ring_rq_db(struct t4_wq *wq,
> u16 inc,
> (void *)wqe);
> } else {
> pr_debug("DB wq->rq.pidx = %d\n", wq->rq.pidx);
> - writel(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
> - wq->rq.bar2_va + SGE_UDB_KDOORBELL);
> + __raw_writel(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
> + wq->rq.bar2_va + SGE_UDB_KDOORBELL);
> }
>
> /* Flush user doorbell area writes. */
> wmb();
> return;
> }
> - writel(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
> + __raw_writel(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
> + mmiowmb();
> }
>
>
Yes, this is what chelsio recommended to me.
Reviewed-by: Steve Wise <swise at opengridcomputing.com>
More information about the linux-arm-kernel
mailing list