mvneta: oops in __rcu_read_lock on mirabox
Thomas Petazzoni
thomas.petazzoni at free-electrons.com
Mon Sep 16 12:24:50 EDT 2013
Russell,
On Mon, 16 Sep 2013 17:22:09 +0100, Russell King - ARM Linux wrote:
> One seemed to be a single bit error in an instruction inside the kernel
> image. The other was what seems to be an impossible abort.
>
> I still don't see how we could end up with a prefetch abort inside memset()
> due to the kernel domain being inaccessible, but still be able to get
> an oops out, especially when we dump out the memory for the faulting
> instruction by accessing that memory via that apparantly inaccessible
> domain while running the code which dumps that memory also under this
> apparantly inaccessible domain. If the domain containing the kernel
> really was inaccessible, the system would be completely dead.
>
> The only possibilities I can come up with for that is that abort was
> caused by something spurious happening at the hardware level causing
> corruption of the instruction TLB (corrupting the domain index stored
> in the I-TLB) or other CPU control hardware causing it to spuriously
> generate that fault.
>
> As the domain field in the page table L1 entries covers bit 8, and the
> single bit error with the instruction was also bit 8, maybe there's a
> design weakness on data line bit 8 causing marginal operation.
>
> To add to this, the abort given in this report gives an IFSR value of
> 0x409, which equates to "Synchronous parity error on memory access"
> in ARMv7. The other value (0x400) equates to "TLB conflict abort"
> which can only happen with LPAE support enabled... So this is just
> getting more weird!
Could this be caused by bitflips in the RAM due to bad timings, or
overheating or that kind of things?
Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
More information about the linux-arm-kernel
mailing list