[PATCH v3 0/9] Introduce per-device completion queue pools
Max Gurtovoy
maxg at mellanox.com
Tue Nov 14 02:06:16 PST 2017
On 11/9/2017 7:31 PM, Bart Van Assche wrote:
> On Thu, 2017-11-09 at 19:22 +0200, Sagi Grimberg wrote:
>> But I'm afraid don't understand how the fact that ULPs will run on
>> different ports matter? how would the fact that we had two different
>> pools on different ports make a difference?
>
> If each RDMA port is only used by a single ULP then the ULP driver can provide
> a better value for the CQ size than IB_CQE_BATCH. If CQ pools would be created
> by ULPs then it would be easy for ULPs to pass their choice of CQ size to the
> RDMA core.
I also prefer more the CQ pools per ULP approach (like we did with the
MR pools per QP) in the first stage. For example, we saw a big
improvement in NVMEoF performance when we did CQ moderation (currently
local implementation in our labs). If we'll moderate shared CQ (iser +
nvmf CQ) we can ruin other ULP performance. ISER/SRP/NVMEoF/NFS has
different needs and different architectures, so even adaptive moderation
will not supply the best performance in that case.
We can (I meant I can :)) also implement SRQ pool per ULP (and then push
my NVMEoF target SRQ per completion vector feature that saves resource
allocation and still gives us very good numbers - almost same as using a
non shared RQ).
>
> In case multiple ULPs share an RDMA port then which CQ is chosen for the ULP
> will depend on the order in which the ULP drivers are loaded. This may lead to
> hard to debug performance issues, e.g. due to different lock contention
> behavior. That's another reason why per-ULP CQ pools look more interesting to
> me than one CQ pool per HCA.
debug is also a good point..
>
> Bart.
-Max.
More information about the Linux-nvme
mailing list