Need some pointers to debug a KASAN splat in NVMe over Fabrics with rdma-rxe
Moni Shoua
monis at mellanox.com
Wed Mar 8 08:33:22 PST 2017
On Wed, Mar 8, 2017 at 5:35 PM, Johannes Thumshirn <jthumshirn at suse.de> wrote:
> Hi Moni et al.,
>
> I'm getting a KASAN stack-out-of-bounds in rxe_post_send+0xdfe/0x1830
> [rdma_rxe] at addr ffff8800187072e8 with v4.11-rc1
>
> rxe_post_send+0xdfe is the following (note: the pr_err was inserted by
> me to aid debugging).
>
> (gdb) list *(rxe_post_send+0xdfe)
> 0x1dc3e is in rxe_post_send (drivers/infiniband/sw/rxe/rxe_verbs.c:765).
> 760 pr_err("%s: *_wr(ibwr): %p\n",
> 761 __func__, (void *)(mask & WR_ATOMIC_MASK ?
> atomic_wr(ibwr)
> 762 : rdma_wr(ibwr)));
> 763
> 764 wqe->iova = (mask & WR_ATOMIC_MASK) ?
> 765
> atomic_wr(ibwr)->remote_addr :
> 766 rdma_wr(ibwr)->remote_addr;
> 767 wqe->mask = mask;
> 768 wqe->dma.length = length;
> 769 wqe->dma.resid = length;
>
> Coincidentially ffff8800187072e8 = ibwr + 0x28. ibwr comes from
> nvme_rdma_post_send() and has an opcode of IB_WR_SEND (verified . So the
> rdma_wr(ibwr) call cannot return a correct/valid parent object (neither
> could the atomic_wr(ibr)).
>
> So much for the easy/mechanic part.
>
> I can special case IB_WR_SEND in rxe's init_send_wqe() but I neither
> know if it is correct nor how the wqe elements (especially wqe->iova)
> should be set up.
>
> So any help would be appreciated here.
>
> Thanks in advance,
> Johannes
> --
Hi Johannes
Your report and analysis seem to be accurate (regarding value of wqe->iova)
Unfortunately we didn't have a chance yet to run kernel application
tests but I will try to add them soon and be able to debug it myself.
In the meantime
1. DId the test fail completely or is it just the KASAN error that
made you look at init_send_wqe()?
2. You can take a look at librxe implementation of init_send_wqe() (it
looks slightly different from kernel's implementation) and see what
happens if you change implementation accordingly.
thanks
Moni
More information about the Linux-nvme
mailing list