[PATCH v2] net: dsa: tag_mtk: add padding for tx packets

Felix Fietkau nbd at nbd.name
Wed May 11 05:24:19 PDT 2022


On 11.05.22 11:32, Vladimir Oltean wrote:
> On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
>> Hi Vladimir,
>> 
>> On 11.05.22 00:21, Vladimir Oltean wrote:
>> > It sounds as if this is masking a problem on the receiver end, because
>> > not only does my enetc port receive the packet, it also replies to the
>> > ARP request.
>> > 
>> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
>> > root at debian:~# ip addr add 192.168.42.1/24 dev eno0
>> > root at debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
>> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
>> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
>> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
>> > ^C
>> > 2 packets captured
>> > 2 packets received by filter
>> > 0 packets dropped by kernel
>> > 
>> > What MAC/driver has trouble with these packets? Is there anything wrong
>> > in ethtool stats? Do they even reach software? You can also use
>> > "dropwatch -l kas" for some hints if they do.
>> 
>> For some reason I can't reproduce the issue of ARPs not getting replies
>> anymore.
>> The garbage data is still present in the ARP packets without my patch
>> though. So regardless of whether ARP packets are processed correctly or if
>> they just trip up on some receivers under specific conditions, I believe my
>> patch is valid and should be applied.
> 
> I don't have a very strong opinion regarding whether to apply the patch or not.
> I think we've removed it from bug fix territory now, until proven otherwise.
I strongly disagree. Without my fix we're relying on undefined behavior 
of the hardware, since the switch requires padding that accounts for the 
special tag.

> I do care about the justification (commit message, comments) being
> correct though. If you cannot reproduce now, someone one year from now
> surely cannot reproduce it either, and won't know why the code is there.
I think there is some misunderstanding here. I absolutely can reproduce 
the corrupted padding reliably, and it matches what I put into commit 
message and comments.

The issue that I can't reproduce reliably at the moment (ARP reception 
failure) is something that I only pointed out in a reply to this thread.
This is what prompted me to look into the padding issue in the first 
place, and it also matches reports about connectivity issues that I got 
from other people.

> FYI, the reason why you call __skb_put_padto() is not the reason why
> others call __skb_put_padto().
It matches the call in tag_brcm.c (because I copied it from there), it's 
just that the symptoms that I'm fixing are different (undefined behavior 
instead of hard packet drop in the switch logic).

>> Who knows, maybe the garbage padding even leaks some data from previous
>> packets, or some other information from within the switch.
> 
> I mean, the padding has to come from somewhere, no? Although I'd
> probably imagine non-scrubbed buffer cells rather than data structures...
> 
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
> 
> But there will be a small performance degradation for small packets due
> to the memset in __skb_pad(), which I'm not sure is worth the change.
I guess we have different views on this. In my opinion, correctness 
matters more in this case than the tiny performance degradation.

- Felix



More information about the Linux-mediatek mailing list