Allocating more RX descriptors that can fit in their related rings

Karthikeyan Periyasamy quic_periyasa at quicinc.com
Fri Sep 6 19:39:15 PDT 2024



On 9/6/2024 9:39 PM, Remi Pommarel wrote:
> On Fri, Sep 06, 2024 at 09:30:33AM +0530, Karthikeyan Periyasamy wrote:
>> On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote:
>>>
>>>
>>> On 9/4/2024 11:31 PM, Remi Pommarel wrote:
>>>> Hello,
>>>>
>>>> As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors
>>>> gets allocated, then CMEM is configured for those descriptors cookie
>>>> conversion and is kept available in dp->rx_desc_free_list pool.
>>>>
>>>> Those descriptors seem to be used to fed two different rings, the
>>>> rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the
>>>> reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While
>>>> the former is kept fully used if possible the latter is only used on
>>>> demand (i.e. reinjection of defragmented MPDU).
>>>>
>>>> It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288)
>>>> is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE +
>>>> DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128).
>>>>
>>>> My question is why are we allocating that much (12288) buffer if only a
>>>> small part (4128) can be used in worst case ?
>>>>> Wouldn't it be ok to only allocate just enough RX descriptors to fill
>>>> both ring (with proper 512 alignment to ease CMEM configuration) as
>>>> below ?
>>>>
>>>>    #define ATH12K_RX_DESC_COUNT   ALIGN(DP_REO_REINJECT_RING_SIZE + \
>>>>                                         DP_RXDMA_BUF_RING_SIZE, \
>>>>                                         ATH12K_MAX_SPT_ENTRIES)
>>>>
>>>> Or am I missing something and this is going to impact performances ?
>>>>
> 
> ...
> 
>>
>> Yes, it will impact performance.
>>
>> Host replenish RxDMA buffers to the HW and after processing it given
>> back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to
>> one-to-one direct mapping. HW consume in-progress Rx buffer depend on
>> Data rate mode. If RxDMA buffers not available then it impact
>> performance due to Out-of-order Rx error due to RxDMA buffer unavailable.
> 
> Thanks for the clarification.
> 
> I think I do see your point, I though the only way to fill descriptors
> to HW was in ath12k_dp_rx_process() by giving back the rx desc after it
> has been used. In that case having extra buffer wouldn't be needed as it
> wouldn't be possible to refill faster that processing those descriptors.
> 

Explicit hw irq request is there to refill the Rx buffer, processed by 
host2rxdma[grp_id] under ath12k_dp_service_srng().

Whenever HW need refill it raise explicit hw irq.

> But it seems that there is a disctinct irq group (i.e. pci*_wlan_dp_3)
> that is used to process RE0, WBM Error, Rx error but also to replenish
> buffers if the refill ring is 3/4 empty (called host2rxdma).

above one

> 
> So hypothetically here, if you isolate this irq to a specific CPU (e.g.
> having more that 4 CPU one for each RX data rings and an extra one for
> error and host2rxdma refill) you could have scenarios where the data
> ring processing ath12k_dp_rx_process() could lag enough to be needed
> this extra buffer refilling, is that correct ?

Depends on the data rate traffic you pump.

But you can experiment the reduced count Rx buffer and see the behavior 
and conclude the performance impact. Also consider the small size frame 
traffic with highest data rate, here more Rx descriptors used for the 
traffic.

> 
> If this is right then that would explain why I didn't see any
> performance difference as with only 4 CPUs (one RX ring processing per
> CPU) the extra buffer refilling couldn't be faster than just giving the
> used descriptor back.
> 
> Thanks
> 

-- 
Karthikeyan Periyasamy
--
கார்த்திகேயன் பெரியசாமி



More information about the ath12k mailing list