[PATCH v2] wifi: mt76: fix potential memory leakage

Sujuan Chen (陈素娟) Sujuan.Chen at mediatek.com
Thu Jan 5 20:04:56 PST 2023


On Fri, 2022-12-23 at 00:57 +0100, lorenzo at kernel.org wrote:
> > > On Thu, 2022-12-22 at 10:44 +0100, lorenzo at kernel.org wrote:
> > > > > On Wed, 2022-12-21 at 19:02 +0100, Lorenzo Bianconi wrote:
> > > > > > On Dec 21, Felix Fietkau wrote:
> > > > > > > Hi Sujuan,
> > > > > > > 
> > > > > > > > Yes, it is so expensive, but if no memcopy, it will
> > > > > > > > casue
> > > > > > > > memory
> > > > > > > > fragmentation (we hit this issue in internal SQC).
> > > > > > > > 
> > > > > > > > as we know, wed needs to exchange rx pkt(belongs to wed
> > > > > > > > rx
> > > > > > > > buffer
> > > > > > > > manager) with wifi driver(dma rx data queue) by
> > > > > > > > exchanging
> > > > > > > > wfdma
> > > > > > > > dmad
> > > > > > > > to ensure the free wed rx buffer.
> > > > > > > > 
> > > > > > > > it is possiable that a large number of buffer has been
> > > > > > > > exchanged
> > > > > > > > to wed
> > > > > > > > and can not come back to wlan driver. So, the memory
> > > > > > > > from the
> > > > > > > > same 32K
> > > > > > > > page cache is unable to be released, and it will be
> > > > > > > > failed at
> > > > > > > > page_frag_alloc in mt76_dma_rx_fill.
> > > > > > > > 
> > > > > > > > Any ideas but memcopy?
> > > > > > > 
> > > > > > > A simple solution would be to simply allocate single
> > > > > > > pages, or
> > > > > > > half-page fragments.
> > > > > > > 
> > > > > > > - Felix
> > > > > > > 
> > > > > > 
> > > > > > A simple approach would be allocating a single page for
> > > > > > each rx
> > > > > > buffer.
> > > > > > 
> > > > > > @Sujuan: can you please double check if it is ok from
> > > > > > performance
> > > > > > and
> > > > > > memory
> > > > > > 	 fragmentation point of view? If not I guess we can try
> > > > > > to
> > > > > > optimize it
> > > > > > 	 and allocate multiple buffers in the same page
> > > > > > tweeking page
> > > > > > refcount.
> > > > > > 
> > > > > > (this patch must be applied on top of Felix's dma fix).
> > > > > > 
> > > > > 
> > > > > Allocating single page for each rx buffer avoids memory
> > > > > fragmentation,
> > > > > but it always uses 4K for one rx pkt which only needs 2K,
> > > > > right?
> > > > 
> > > > correct, we can optimize it allocating multiple buffers (in
> > > > this case
> > > > 2,
> > > > assuming 4K pages) in a single page and recycling the page.
> > > > 
> > > > > 
> > > > > I guess performance would be worse without page cache.
> > > > 
> > > > I think it is a trade-off
> > > > 
> > > > > We have tested on the mtk private driver, 7% drop in
> > > > > throughput
> > > > > when
> > > > > setting the 4K page cache compared to the 32K page cache.
> > > > > and 10% drop when use slab to allocate buffer.
> > > > 
> > > > IIUC you tested with PAGE_FRAG_CACHE_MAX_SIZE set to 4K (or
> > > > with a
> > > > private
> > > > page_frag_alloc() implementation) and you avoided memory
> > > > allocation
> > > > failures due to fragmentation but you got ~ 7% drop in
> > > > throughput,
> > > > correct?
> > > > I think this is quite expected since we need to refill ~ 8
> > > > times more
> > > > the
> > > > page cache.
> > > > 
> > > > Not considering memory fragmentation, have you measured the
> > > > impact of
> > > > the
> > > > memcpy of a full buffer?
> > > > 
> > > 
> > > well, for pure sw path, it maybe ~300M drop (not sure) when using
> > > memcpy(5G/HE80/4*4).
> > > but we do memcpy only when wed on, atfer a few unbinding pkts,
> > > then all
> > > flows are offloaded by hw. it is also a trade-off~
> > 
> > I guess we have quite a big impact if we enable wed and we do not
> > actually
> > offload any flow (or we do not enable hw flowtable in netfilter),
> > dont' we?
> 
> Hi Sujuan,
> 
> here ([0],[1]) I switched mt76 to page_pool allocator [2]. page_pool
> allows us to
> allocate pages of a requested order (order 0 for mt76) and split them
> between
> multiple buffers (2 buffers per page for mt7915). Moreover pp
> allocator recycles
> the pages in order to avoid the page allocator overhead and keep the
> pages DMA mapped
> if requested (I implemented the support for mt7915).
> The code is stable (with and without WED) but I have not run any
> performance
> tests. Can you please help with them to see if this code helps
> solving the
> discussed issue?
> 

Hi Lore,


I tried to apply your patches into my codebase(kernel-5.x). However
since page pool api coming from kernel-6.x does not exist in kernel-
5.x, I can not build a proper image for testing (even thought we
applied 30+ page pool patches).

Sorry for not helping you.

Reagrds,
Sujuan

> Regards,
> Lorenzo
> 
> [0] 
> https://github.com/LorenzoBianconi/wireless-drivers-next/commit/61ce8e9be04b40fa8a1c0f2778d266982be32d7a
> [1] 
> https://github.com/LorenzoBianconi/wireless-drivers-next/commit/f54eefce06d389b155574092d246b21631e42d47
> [2] https://docs.kernel.org/networking/page_pool.html
> 
> > 
> > Regards,
> > Lorenzo
> > 
> > > 
> > > Regards,
> > > Sujuan
> > > 
> > > > > 
> > > > > A single page per rx buffer may cause a throughput drop of
> > > > > over 7%
> > > > > and
> > > > > waste memory, what do you think?
> > > > 
> > > > Implementing the page recycles as it is done in
> > > > page_frag_alloc() we
> > > > should get
> > > > the same results you got with PAGE_FRAG_CACHE_MAX_SIZE set to
> > > > 4K.
> > > > 
> > > > Regards,
> > > > Lorenzo
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Sujuan
> > > > > 
> > > > > > Regards,
> > > > > > Lorenzo
> > > > > > 
> > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/dma.c
> > > > > > b/drivers/net/wireless/mediatek/mt76/dma.c
> > > > > > index 28a7fe064313..1d9e580977fc 100644
> > > > > > --- a/drivers/net/wireless/mediatek/mt76/dma.c
> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/dma.c
> > > > > > @@ -580,6 +580,20 @@ mt76_dma_tx_queue_skb(struct mt76_dev
> > > > > > *dev,
> > > > > > struct mt76_queue *q,
> > > > > >  	return ret;
> > > > > >  }
> > > > > >  
> > > > > > +static void *
> > > > > > +mt76_dma_get_rx_buf(struct mt76_queue *q)
> > > > > > +{
> > > > > > +	if ((q->flags & MT_QFLAG_WED) &&
> > > > > > +	    FIELD_GET(MT_QFLAG_WED_TYPE, q->flags) ==
> > > > > > MT76_WED_Q_RX) {
> > > > > > +		/* WED RX queue */
> > > > > > +		struct page *page = dev_alloc_page();
> > > > > > +
> > > > > > +		return page ? page_address(page) : NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	return page_frag_alloc(&q->rx_page, q->buf_size,
> > > > > > GFP_ATOMIC);
> > > > > > +}
> > > > > > +
> > > > > >  static int
> > > > > >  mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue
> > > > > > *q)
> > > > > >  {
> > > > > > @@ -596,7 +610,7 @@ mt76_dma_rx_fill(struct mt76_dev *dev,
> > > > > > struct
> > > > > > mt76_queue *q)
> > > > > >  		struct mt76_queue_buf qbuf;
> > > > > >  		void *buf = NULL;
> > > > > >  
> > > > > > -		buf = page_frag_alloc(&q->rx_page, q->buf_size,
> > > > > > GFP_ATOMIC);
> > > > > > +		buf = mt76_dma_get_rx_buf(q);
> > > > > >  		if (!buf)
> > > > > >  			break;
> > > > > >  
> > > > > > diff --git
> > > > > > a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> > > > > > b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> > > > > > index 1a2e4df8d1b5..2924e71e4fbe 100644
> > > > > > --- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
> > > > > > @@ -594,13 +594,9 @@ static void
> > > > > > mt7915_mmio_wed_offload_disable(struct mtk_wed_device *wed)
> > > > > >  static void mt7915_mmio_wed_release_rx_buf(struct
> > > > > > mtk_wed_device
> > > > > > *wed)
> > > > > >  {
> > > > > >  	struct mt7915_dev *dev;
> > > > > > -	u32 length;
> > > > > >  	int i;
> > > > > >  
> > > > > >  	dev = container_of(wed, struct mt7915_dev,
> > > > > > mt76.mmio.wed);
> > > > > > -	length = SKB_DATA_ALIGN(NET_SKB_PAD + wed->wlan.rx_size 
> > > > > > +
> > > > > > -				sizeof(struct
> > > > > > skb_shared_info));
> > > > > > -
> > > > > >  	for (i = 0; i < dev->mt76.rx_token_size; i++) {
> > > > > >  		struct mt76_txwi_cache *t;
> > > > > >  
> > > > > > @@ -610,7 +606,7 @@ static void
> > > > > > mt7915_mmio_wed_release_rx_buf(struct
> > > > > > mtk_wed_device *wed)
> > > > > >  
> > > > > >  		dma_unmap_single(dev->mt76.dma_dev, t-
> > > > > > >dma_addr,
> > > > > >  				 wed->wlan.rx_size,
> > > > > > DMA_FROM_DEVICE);
> > > > > > -		__free_pages(virt_to_page(t->ptr),
> > > > > > get_order(length));
> > > > > > +		free_page(virt_to_page(t->ptr));
> > > > > >  		t->ptr = NULL;
> > > > > >  
> > > > > >  		mt76_put_rxwi(&dev->mt76, t);
> > > > > > @@ -621,13 +617,9 @@ static u32
> > > > > > mt7915_mmio_wed_init_rx_buf(struct
> > > > > > mtk_wed_device *wed, int size)
> > > > > >  {
> > > > > >  	struct mtk_rxbm_desc *desc = wed->rx_buf_ring.desc;
> > > > > >  	struct mt7915_dev *dev;
> > > > > > -	u32 length;
> > > > > >  	int i;
> > > > > >  
> > > > > >  	dev = container_of(wed, struct mt7915_dev,
> > > > > > mt76.mmio.wed);
> > > > > > -	length = SKB_DATA_ALIGN(NET_SKB_PAD + wed->wlan.rx_size 
> > > > > > +
> > > > > > -				sizeof(struct
> > > > > > skb_shared_info));
> > > > > > -
> > > > > >  	for (i = 0; i < size; i++) {
> > > > > >  		struct mt76_txwi_cache *t = mt76_get_rxwi(&dev-
> > > > > > >mt76);
> > > > > >  		dma_addr_t phy_addr;
> > > > > > @@ -635,7 +627,7 @@ static u32
> > > > > > mt7915_mmio_wed_init_rx_buf(struct
> > > > > > mtk_wed_device *wed, int size)
> > > > > >  		int token;
> > > > > >  		void *ptr;
> > > > > >  
> > > > > > -		page = __dev_alloc_pages(GFP_KERNEL,
> > > > > > get_order(length));
> > > > > > +		page = __dev_alloc_page(GFP_KERNEL);
> > > > > >  		if (!page)
> > > > > >  			goto unmap;
> > > > > >  
> > > > > > @@ -644,7 +636,7 @@ static u32
> > > > > > mt7915_mmio_wed_init_rx_buf(struct
> > > > > > mtk_wed_device *wed, int size)
> > > > > >  					  wed->wlan.rx_size,
> > > > > >  					  DMA_TO_DEVICE);
> > > > > >  		if (unlikely(dma_mapping_error(dev->mt76.dev,
> > > > > > phy_addr))) {
> > > > > > -			__free_pages(page, get_order(length));
> > > > > > +			free_page(page);
> > > > > >  			goto unmap;
> > > > > >  		}
> > > > > >  
> > > > > > @@ -653,7 +645,7 @@ static u32
> > > > > > mt7915_mmio_wed_init_rx_buf(struct
> > > > > > mtk_wed_device *wed, int size)
> > > > > >  		if (token < 0) {
> > > > > >  			dma_unmap_single(dev->mt76.dma_dev,
> > > > > > phy_addr,
> > > > > >  					 wed->wlan.rx_size,
> > > > > > DMA_TO_DEVICE);
> > > > > > -			__free_pages(page, get_order(length));
> > > > > > +			free_page(page);
> > > > > >  			goto unmap;
> > > > > >  		}
> > > > > >  
> 
> 


More information about the Linux-mediatek mailing list