Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable
David Arendt
admin at prnet.org
Thu Feb 12 14:23:44 PST 2026
On 2/12/26 10:05 PM, Qu Wenruo wrote:
>
>
> 在 2026/2/13 06:41, David Arendt 写道:
>> Hello,
>>
>> I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data
>> is stored using a btrfs filesystem as backend. If using kernel 6.19.0
>> or kernel 6.18.10 I have experienced many crashes during high IO load
>> on all 3 nodes. Reverting back to 6.18.9 solves the problems
>> completely. Unfortunately the crashes are spontaneous reboots without
>> leaving a trace in any logfile, so I have no stacktrace of them.
>> After the crashes I have sometimes incorrect btrfs csums for a file
>> but these may also be a result of a partial write due to the crash.
>> On one node I had a btrfs error logged without crashing, but I am not
>> sure if this is the root cause or a result of a prior crash. A scrub
>> after reboot returned no error with 6.19.0.
>
> The offending tree dump items are:
>
> Feb 10 13:31:07 opi02 kernel: item 92 key (13218356101120
> Feb 10 13:31:07 opi02 kernel: item 93 key (13216208642048
> Feb 10 13:31:07 opi02 kernel: item 94 key (13218356162560
>
> Obviously item 93 is smaller than all its previous and next item keys.
>
> hex(13218356101120) = 0xc05a36b8000
> hex(13216208642048) = 0xc05236be000
> hex(13218356162560) = 0xc05a36c7000
>
> It looks like something fliped, "0xc05a3" -> "0xc0523"
>
> 0xa -> 0x2 is exactly one bit flipped.
>
> So either the memory hardware has something wrong and resulting a
> sticking bit (always 0), or there is something inside the kernel
> touching memory it shouldn't.
>
> And this exactly matches the symptom, changing random bit of your
> kernel, crash always expected.
>
>
> Can you run a memtest to make sure it is not hardware problems first?
Hello,
I don't know of anything like memtest86 for the arm64 platform for
testing the whole memory, so I used the user space memtester to check
the 14G of unused ram on all 3 machines while using kernel 6.18.10.
Here is the result of the first iteration (same on every machine):
memtester version 4.7.1 (64-bit)
Copyright (C) 2001-2024 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 14000MB (14680064000 bytes)
got 14000MB (14680064000 bytes), trying mlock ...locked.
Loop 1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
I don't think it is hardware a failure as it is happening on 3 different
machines. Crashes occur somewhere between 30 minutes and 12 hours on all
3 machines that have been running without a single crash for more than a
year now with older kernel versions including 4 days with 6.18.9 and all
version from 6.18.0 to 6.18.9, so it seems to be caused by something
that has changed between 6.18.9 and 6.18.10.
Thanks,
David Arendt
>
> Thanks,
> Qu
>
>
>>
>> Unfortunately I don't have more information at the moment.
>>
>> Thanks in advance,
>>
>> David Arendt
>>
>>
>
More information about the Linux-rockchip
mailing list