rcu-torture: Internal error: Oops: 96000006

Paul E. McKenney paulmck at kernel.org
Fri Jan 22 18:23:39 EST 2021


On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote:
> On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney <paulmck at kernel.org> wrote:
> >
> > On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
> > > On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney <paulmck at kernel.org> wrote:
> > > >
> > > > On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
> > > > > On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
> > > > > > > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device
> > > > > > > the following kernel crash noticed. This started happening from Linux next
> > > > > > > next-20210111 tag to next-20210121.
> > > > > > >
> > > > > > > metadata:
> > > > > > >   git branch: master
> > > > > > >   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> > > > > > >   git describe: next-20210111
> > > > > > >   kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
> > > > > > >
> > > > > > > output log:
> > > > > > >
> > > > > > > [  621.538050] mem_dump_obj() slab test: rcu_torture_stats =
> > > > > > > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z
> > > > > > > = ffff8000091ab8e0
> > > > > > > [  621.546662] mem_dump_obj(ZERO_SIZE_PTR):
> > > > > > > [  621.546696] Unable to handle kernel NULL pointer dereference at
> > > > > > > virtual address 0000000000000008
> > > > >
> > > > > [...]
> > > > >
> > > > > > Huh.  I am relying on virt_addr_valid() rejecting NULL pointers and
> > > > > > things like ZERO_SIZE_PTR, which is defined as ((void *)16).  It looks
> > > > > > like your configuration rejects NULL as an invalid virtual address,
> > > > > > but does not reject ZERO_SIZE_PTR.  Is this the intent, given that you
> > > > > > are not allowed to dereference a ZERO_SIZE_PTR?
> > > > > >
> > > > > > Adding the ARM64 guys on CC for their thoughts.
> > > > >
> > > > > Spooky timing, there was a thread _today_ about that:
> > > > >
> > > > > https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
> > > >
> > > > Very good, then my workaround (shown below for Naresh's ease of testing)
> > > > is only a short-term workaround.  Yay!  ;-)
> > >
> > > Paul, thanks for your (short-term workaround) patch.
> > >
> > > I have applied your patch and tested rcu-torture test on qemu_arm64 and
> > > the reported issues has been fixed.
> >
> > May I add your Tested-by?
> 
> Yes.  Please add Reported-by and Tested-by.

Very good!  I have added:

Tested-by: Naresh Kamboju <naresh.kamboju at linaro.org>

Because I folded the workaround into the first commit in the series,
instead of adding your Reported-by, I added the following to that commit:

[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]

> > And before I forget again, good to see the rcutorture testing on a
> > non-x86 platform!
> 
> We are running rcutorture tests on arm, arm64, i386 and x86_64.

Nice!!!

Some ARMv8 people are getting bogus (but harmless) error messages
because parts of rcutorture think that all the world is an x86.
I am looking at a fix, but need to work out what the system is.
To that end, coul you please run the following on the arm, arm64,
and i386 systems and tell me what the output is?

	gcc -dumpmachine

> Happy to test !

And thank you very much for your testing efforts!!!

							Thanx, Paul



More information about the linux-arm-kernel mailing list