net: macb: fail when there's no PHY

Grant Edwards grant.b.edwards at gmail.com
Fri Dec 4 08:42:55 EST 2020


On 2020-12-04, Alexander Dahl <ada at thorsis.com> wrote:

> sorry if I just hijack your conversation, but I'm curious, because
> we are using the same SoC.  Adding linux-arm-kernel to Cc for the
> general performance issues and linux-mtd for the ECC questions. O:-)

No worries. I tried to ask about performance issues on
linux-arm-kernel, but AFAICT, my post wasn't allowed by the moderator.

> Am Donnerstag, 3. Dezember 2020, 23:20:38 CET schrieb Grant Edwards:
>> On 2020-12-03, Andrew Lunn <andrew at lunn.ch> wrote:

>>>> I don't think there's any way I could justify using a kernel that
>>>> doesn't have long-term support.
>>> 
>>> 5.10 is LTS. Well, it will be, once it is actually released!
>> 
>> Convincing people to ship an unreleased kernel would be a whole
>> 'nother bucket of worms.
>
> +1
>
> Judging just from the release dates of the last LTS kernels, I would
> have guessed v5.11 will get LTS.  But there has been no announcement
> yet and I suppose there will be none before release?  For ordinary
> users it's just like staring in a crystal ball, so we aim at v5.4
> for our more recent hardware platforms. ¯\_(ツ)_/¯

Exactly. We're already behind schedule, and just assuming that
5.<whatever> is going to be LTS and will be released in time for
shipment just won't fly.

>> A simple user-space multi-threaded TCP echo server benchmark showed
>> a 30-50% (depending on number of parallel connections) drop in max
>> throughput. Our typical applications also show a 15-25% increase in
>> CPU usage for an equivalent workload.  Another problem is high
>> latencies with 5.4. A thread that is supposed to wake up every 1ms
>> works reliably on 2.6.33, but is a long ways from working on 5.4.
>
> We use the exact same SoC with kernel 4.9.220-rt143 (PREEMPT RT)
> currently, after being stuck on 2.6.36.4 for quite a while.  I did
> not notice significant performance issues, but I have to admit, we
> never did extensive benchmarks on network or CPU performance,
> because the workload also changed for that target.

The performance hit varied quite a bit depening on the application,
but seemed to be a minimum of about a 15% increase in CPU usage.

We discussed trying various older LTS kernels to see if we could find
one that performed acceptably, but it would take a lot of engineering
time to port and benchmark each version.

> However what gave us a lot less dropped packages was using the
> internal SRAM as DMA buffer for RX packages received by macb.  That
> did not make it in mainline however, I did not put enough effort in
> polishing that patch back when we migrated from 2.6 to 4.x.  If
> you're curious, it's on GitHub:
> https://github.com/LeSpocky/linux/tree/net-macb-sram-rx

We haven't identified the source of the drop in network throughput,
but the increased CPU usage and problems with latency affect
applications that don't use the network at all (and there is no
network traffic).

>> We've already disabled absoltely everything we can and still have a
>> working system. With the same features enabled, the 5.4 kernel was
>> about 75% larger than a 2.6.33 kernel, so we had to trim quite a bit
>> of meat to get it to boot on existing units.
>
> Same here.  v4.9 kernel image still fits, v4.14 is already too big for some 
> devices we delivered in the early days.

Yea, I was shocked at the massive amount of bloat.

>> We also can't get on-die ECC support for Micron NAND flash working
>> with 5.4. Even it did work, we'd still have to add the ability to
>> fall-back to SW ECC on read operations for the sake of backwards
>> compatibility on units that were initially shipped without on-die
>> ECC support enabled.
>
> IIRC the SoC itself has issues with its ECC engine? [...]

Sorry, then terminology in the kernel's nand subsystem is a bit
vague. The "on-die" ECC support refers to using the ECC hardware built
in to the NAND flash device itself. In the Linux nand subsystem
"hardware" or "HW" ECC refers to the ECC hardware built into the
SoC. You're right, that's broken in the '9G20 and we don't use it.

We added "on-die" support to the 2.6.33 kernel with fallback to SW ECC
on read operations for backwards compatibility. It was pretty
straight-forward and works well. The 5.4 kernel's on-die support is
several orders of magnitude more complex (I don't yet know why), and
doesn't offer SW fallback.

One of the odd things about the micron on-die support in the 5.4
kernel is that it's constantly being switched on and off. It's turned
on before every page read/write, then off afterwards. This adds a lot
of overhead to page read/write operations. After about the 6th on/off
cycle, that stops working and the "set-feature" function starts
returning an error every time. Read operations produce the correct
data, but always report uncorrectable flipped bits when there aren't
any (haven't figured out why). I haven't tried writes.

In our 2.6.33 on-die support we identify the chip at boot up. If it
has on-die support, we turn it on, leave it on, and just use it that
way. If a read operation returns an error, we check to see if the page
was written with SW ECC and if it was, use that. Writes are always
done with on-die ECC.

--
Grant




More information about the linux-arm-kernel mailing list