[PATCH v2] ethernet:arc: Fix racing of TX ring buffer

Lino Sanfilippo LinoSanfilippo at gmx.de
Mon May 23 18:09:49 PDT 2016


On 23.05.2016 00:36, Francois Romieu wrote:
> Lino Sanfilippo <LinoSanfilippo at gmx.de> :
> [...]
>> --- a/drivers/net/ethernet/arc/emac_main.c
>> +++ b/drivers/net/ethernet/arc/emac_main.c
>> @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>>  		unsigned int *txbd_dirty = &priv->txbd_dirty;
>>  		struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
>>  		struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
>> -		struct sk_buff *skb = tx_buff->skb;
>>  		unsigned int info = le32_to_cpu(txbd->info);
>> +		struct sk_buff *skb;
>>  
> 
> Insert a smp_rmb() here to close one window for an outdated txbd_dirty value
> (the "arc_emac_tx_clean wrote txbd_curr and issued smp_wmb" one).
> 
> Actually, insert smp_rmb() at the start of arc_emac_tx_clean() as it
> does not need to be performed withing the loop and would penalize it.

I agree, we should place the barrier before the loop. I also think it is indeed more
appropriate to use the SMP versions for both barriers.

> 
> Given the implicit smp barriers in the non-driver code, I consider
> "arc_emac_tx_clean on one CPU does not read latest txbd_dirty value written
> by previous arc_emac_tx_clean on different CPU" as utter onanism but
> issueing smp_rmb() at the start of arc_emac_tx_clean() nails it as well.
> 
>> -		if ((info & FOR_EMAC) || !txbd->data || !skb)
>> +		if (*txbd_dirty == priv->txbd_curr)
>>  			break;
> 
> Ok, it's just the "while (priv->txbd_dirty != priv->txbd_curr) {" loop
> in disguise.

I cant deny that :)

> 
>>  
>> +		/* Make sure curr pointer is consistent with info */
>> +		rmb();
>> +
>> +		info = le32_to_cpu(txbd->info);
>> +
>> +		if (info & FOR_EMAC)
>> +			break;
> 
> With proper ordering + barrier in arc_emac_tx() you can relax it to smp_rmb().

Ok.

>> +
>> +		skb = tx_buff->skb;
>> +
>>  		if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
>>  			stats->tx_errors++;
>>  			stats->tx_dropped++;
>> @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>>  		*txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
>>  	}
>>  
>> -	/* Ensure that txbd_dirty is visible to tx() before checking
>> -	 * for queue stopped.
>> +	/* Ensure that txbd_dirty is visible to tx() and we see the most recent
>> +	 * value for txbd_curr.
>>  	 */
>>  	smp_mb();
>>  
>> @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
>>  	dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len);
>>  
>>  	priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
>> -
>> -	/* Make sure pointer to data buffer is set */
>> -	wmb();
>> +	priv->tx_buff[*txbd_curr].skb = skb;
>>  
>>  	skb_tx_timestamp(skb);
>>  
>>  	*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
> 
> No.
> 
> You need dma_wmb() after skb_tx_timestamp() to commit skb_tx_timestamp() [*]
> and data = cpu_to_le32(addr).

I dont agree here. A dma_wmb would have an effect to "data" and "info", yes, but it 
would have absolutely no effect to skb_tx_timestamp(), since there is no dma access
involved here. In fact skb_tx_timestamp() could probably be even reordered to appear
 after the dma_wmb.
Anyway, there is the wmb() directly after the assignment to "info". _This_ barrier
 should ensure that skb_tx_timestamp() (along with a flush of data and info to DMA)
is executed before "txbd_curr" is advanced.
This means that the corresponding skb cant be freed prematurely by tx_clean().

 
> 
> [*] I doubt anyone want a dma_sync_single_...() here.
> 
>>  
>> -	/* Make sure info word is set */
>> +	/* 1. Make sure that with respect to tx_clean everything is set up
>> +	 * properly before we advance txbd_curr.
>> +	 * 2. Make sure writes to DMA descriptors are completed before we inform
>> +	 * the hardware.
>> +	 */
>>  	wmb();
> 
> Yes, either wmb() or smp_wmb() + dma_wmb().
> 

I really prefer one generic barrier over combos of 2 or more special barriers.

>>  
>> -	priv->tx_buff[*txbd_curr].skb = skb;
>> -
>>  	/* Increment index to point to the next BD */
>>  	*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
>>  
>> -	/* Ensure that tx_clean() sees the new txbd_curr before
>> -	 * checking the queue status. This prevents an unneeded wake
>> -	 * of the queue in tx_clean().
>> +	/* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
>> +	 * the updated value of txbd_curr.
>>  	 */
>>  	smp_mb();
> 
> Nit: s/the most/a/
> 
> "a" as in "arc_emac_tx_clean() _is_ racing with arc_emac_tx"
> 
>>  
>> -	if (!arc_emac_tx_avail(priv)) {
>> +	if (!arc_emac_tx_avail(priv))
>>  		netif_stop_queue(ndev);
>> -		/* Refresh tx_dirty */
>> -		smp_mb();
>> -		if (arc_emac_tx_avail(priv))
>> -			netif_start_queue(ndev);
>> -	}
> 
> No.
> 
> I may sound like an old record but the revalidation part must be kept.
> 
> txbd_dirty may change in the arc_emac_tx_avail.. netif_stop_queue window
> (the race requires a different CPU as arc_emac_tx() runs in locally
> disabled BH context).
> 

Ok, I can see the race now. So yes, this should be kept.

I will prepare a patch with the discussed changes tomorrow.

Regards,
Lino






More information about the Linux-rockchip mailing list