build_skb() and data corruption

Ben Hutchings bhutchings at solarflare.com
Mon Jan 13 08:42:19 EST 2014


On Mon, 2014-01-13 at 12:47 +0100, Jonas Jensen wrote:
> Please help,
> 
> I think I see memory corruption with a driver recently added to linux-next.
> 
> The following error occur downloading a large file with wget (or
> ncftp): "read error: Bad address"
> wget exits and leaves the file unfinished.
> 
> The error goes away when build_skb() is patched out, in this case by
> allocating pages directly in RX loop.

The problem is you're assuming build_skb() does rather more than it
does.

> I currently have two branches with code placed under ifdef USE_BUILD_SKB:
> https://bitbucket.org/Kasreyn/linux-next/src/faa7c408a655fdfd7c383f79259031da5a05d61e/drivers/net/ethernet/moxa/moxart_ether.c#cl-472

Quoting from there:

> #define USE_BUILD_SKB
> #ifdef USE_BUILD_SKB
> 
> 
> 	skb = build_skb(page_address(priv->rx_pages[priv->rx_head]), priv->rx_buf_size);
> 
> 
> 	if (unlikely(!skb)) {
> 		net_info_ratelimited("RX build_skb failed\n");

Don't log; increment the rx_dropped counter.

> 		moxart_rx_drop_packet(ndev, desc0);
> 		return true;
> 	}
>
> 
> 	if (likely(skb))

Redundant condition.

> 		get_page(priv->rx_pages[priv->rx_head]);
> 
> 
> 	skb_put(skb, len);
> 
> 
> #else
> 
> 
> 	dma_unmap_page(&ndev->dev, priv->rx_buf_phys[priv->rx_head], RX_BUF_SIZE, DMA_FROM_DEVICE);

dma_unmap_page() is always needed, so move this above the #ifdef
USE_BUILD_SKB.

> 
> 	skb = netdev_alloc_skb_ip_align(ndev, 128);

Missing check for NULL.

> 	skb_fill_page_desc(skb, 0, priv->rx_pages[priv->rx_head], 0, len);
> 	skb->len += len;
> 	skb->data_len += len;
> 
> 
> 	if (len > 128) {
> 		skb->truesize += PAGE_SIZE;
> 		__pskb_pull_tail(skb, ETH_HLEN);
> 	} else {
> 		__pskb_pull_tail(skb, len);
> 	}
> 
> 
> 	p = priv->rx_pages[priv->rx_head];

Missing check for NULL.

> 	moxart_alloc_rx_page(ndev, priv->rx_head);

New buffer is always required, so move this under the #endif... well,
except that if moxart_alloc_rx_page() fails then things go horribly
wrong as you have.

Best practice is to try allocating the next buffer before passing up the
current one; if that fails then you recycle the current buffer and
increment rx_dropped.

> 
> 
> #endif
[end quoted code]

> If build_skb() is the cause, is there something the driver can do about it?
> 
> A quick search on "build_skb memory corruption" reveals the following
> commit, "igb: Revert support for build_skb in igb"
> f9d40f6a9921cc7d9385f64362314054e22152bd:
[...]

The problem I identified in igb occurs only when splitting a page into
multiple buffers (or when recycling DMA-mapped pages).  build_skb() is
fine to use whenever each RX buffer is a single page (except that that's
memory-inefficient) that's unmapped on completion.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.




More information about the linux-arm-kernel mailing list