Allocating more RX descriptors that can fit in their related rings
Remi Pommarel
repk at triplefau.lt
Mon Sep 9 06:12:39 PDT 2024
On Sat, Sep 07, 2024 at 08:09:15AM +0530, Karthikeyan Periyasamy wrote:
>
>
> On 9/6/2024 9:39 PM, Remi Pommarel wrote:
> > On Fri, Sep 06, 2024 at 09:30:33AM +0530, Karthikeyan Periyasamy wrote:
> > > On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote:
> > > >
> > > >
> > > > On 9/4/2024 11:31 PM, Remi Pommarel wrote:
> > > > > Hello,
> > > > >
> > > > > As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors
> > > > > gets allocated, then CMEM is configured for those descriptors cookie
> > > > > conversion and is kept available in dp->rx_desc_free_list pool.
> > > > >
> > > > > Those descriptors seem to be used to fed two different rings, the
> > > > > rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the
> > > > > reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While
> > > > > the former is kept fully used if possible the latter is only used on
> > > > > demand (i.e. reinjection of defragmented MPDU).
> > > > >
> > > > > It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288)
> > > > > is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE +
> > > > > DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128).
> > > > >
> > > > > My question is why are we allocating that much (12288) buffer if only a
> > > > > small part (4128) can be used in worst case ?
> > > > > > Wouldn't it be ok to only allocate just enough RX descriptors to fill
> > > > > both ring (with proper 512 alignment to ease CMEM configuration) as
> > > > > below ?
> > > > >
> > > > > #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \
> > > > > DP_RXDMA_BUF_RING_SIZE, \
> > > > > ATH12K_MAX_SPT_ENTRIES)
> > > > >
> > > > > Or am I missing something and this is going to impact performances ?
> > > > >
> >
> > ...
> >
> > >
> > > Yes, it will impact performance.
> > >
> > > Host replenish RxDMA buffers to the HW and after processing it given
> > > back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to
> > > one-to-one direct mapping. HW consume in-progress Rx buffer depend on
> > > Data rate mode. If RxDMA buffers not available then it impact
> > > performance due to Out-of-order Rx error due to RxDMA buffer unavailable.
> >
> > Thanks for the clarification.
> >
> > I think I do see your point, I though the only way to fill descriptors
> > to HW was in ath12k_dp_rx_process() by giving back the rx desc after it
> > has been used. In that case having extra buffer wouldn't be needed as it
> > wouldn't be possible to refill faster that processing those descriptors.
> >
>
> Explicit hw irq request is there to refill the Rx buffer, processed by
> host2rxdma[grp_id] under ath12k_dp_service_srng().
>
> Whenever HW need refill it raise explicit hw irq.
>
> > But it seems that there is a disctinct irq group (i.e. pci*_wlan_dp_3)
> > that is used to process RE0, WBM Error, Rx error but also to replenish
> > buffers if the refill ring is 3/4 empty (called host2rxdma).
>
> above one
I do think it is when refill ring is 3/4 empty if I understand the
following excerpt from ath12k_dp_rx_bufs_replenish() correctly:
if (!req_entries && (num_free > (rx_ring->bufs_max * 3) / 4))
req_entries = num_free;
>
> >
> > So hypothetically here, if you isolate this irq to a specific CPU (e.g.
> > having more that 4 CPU one for each RX data rings and an extra one for
> > error and host2rxdma refill) you could have scenarios where the data
> > ring processing ath12k_dp_rx_process() could lag enough to be needed
> > this extra buffer refilling, is that correct ?
>
> Depends on the data rate traffic you pump.
>
> But you can experiment the reduced count Rx buffer and see the behavior and
> conclude the performance impact. Also consider the small size frame traffic
> with highest data rate, here more Rx descriptors used for the traffic.
You right small packet traffic could increase the rx descriptors ring
pressure and there rx buffer starvation could possibly happen in one to
one mapping.
Thanks for your clarifications.
This question came from the fact that we are using a Qualcomm internal
patch (not apply to mainline yet) that introduces a 512MB memory profile
config for which the opposite situation happen (less RX descriptor than
room in the DP_RXDMA_BUF_RING_SIZE) causing fragmented packet to be drop
(all rx descriptors being used for rxdma buf reception and none left for
ath12k_dp_rx_h_defrag_reo_reinject()).
So as long as ATH12K_RX_DESC_COUNT is higher than the sum of
rx_refill_buf_ring and reo_reinject_ring size that is fine with me.
Maybe that is worth to be asserted at build time ?
--
Remi
More information about the ath12k
mailing list