Problems booting exynos5420 with >1 CPU
Nicolas Pitre
nicolas.pitre at linaro.org
Sat Jun 7 13:06:36 PDT 2014
On Sat, 7 Jun 2014, Lorenzo Pieralisi wrote:
> On Sat, Jun 07, 2014 at 05:10:27PM +0100, Nicolas Pitre wrote:
> > On Sat, 7 Jun 2014, Abhilash Kesavan wrote:
> >
> > > Hi Nicolas,
> > >
> > > The first man of the incoming cluster enables its snoops via the
> > > power_up_setup function. During secondary boot-up, this does not occur
> > > for the boot cluster. Hence, I enable the snoops for the boot cluster
> > > as a one-time setup from the u-boot prompt. After secondary boot-up
> > > there is no modification that I do.
> >
> > OK that's good.
> >
> > > Where should this be ideally done ?
> >
> > If I remember correctly, the CCI can be safely activated only when the
> > cache is disabled. So that means the CCI should ideally be turned on
> > for the boot cluster (and *only* for the boot CPU) by the bootloader.
>
> CCI ports are enabled per-cluster, so the boot loader must turn on
> CCI for all clusters before the respective CPUs have a chance to
> turn on their caches. It is a secure operation, this can be overriden
> and probably that's what has been done, wrongly.
Careful. By saying "for all clusters" you might be interpreted as
saying that the CCI must be turned on even for CPUs that are not powered
up.
> True, TC2 on warm-boot reenables CCI, and that's because it runs the
> kernel in secure world, and again that's _wrong_.
Let me respectfully disagree.
> It must be done in firmware, and I am totally against any attempt at
> papering over what looks like a messed up firmware implementation with
> a bunch of hacks in the kernel, because that's what the patch below is
> (soft restarting a CPU to enable CCI ? and adding generic code for that ?
> what's next ?)
Are you promoting for the removal of drivers/bus/arm-cci.c ?
You do realize that the fundamental raison d'être for MCPM is actually
to manage the race free enabling of the cache and CCI ?
> I understand there is an issue and lots at stake here, but I do not want the
> patch below to be merged in the kernel, I am sorry, it is a tad too much.
Lorenzo: Don't get me wrong. The Chromebooks, and possibly to some
extent some people at Samsung, were simply too confident in their
ability to create absolutely bug-free firmware code to the point of not
making its update easy in the field. This is completely outrageous in
my point of view. Yet one of the reactions was to call upstream kernel
people as purists because the kernel is so much easier to modify in
order to cover their mess and yet that might not be accepted. Like I
said I won't stop shaming them publicly for their own "incompetence"
just yet (no pun intended), but being excessively purist does not
benefit anyone either, and for that they have a point.
*HOWEVER* I have no choice but to say that many people at ARM, including
a couple individuals for whom I nevertheless have a lot of admiration,
also have an incredible faith in their ability to convince themselves,
and then turn around to preach to the world, that *more firmware* is
going to be so much purer and solve so many more problems than it
creates and become such a magical success that we should immediately
dedicate our soul to the cause with no hint of a doubt.
I'm sorry to rain on your parade, but I don't believe in this one iota.
Let me repeat the MCPM story again: it took 3 people, including 2 from
ARM, over *six* months to get everything right and stable on TC2. I
think you also contributed to that effort as well. Subsequent MCPM
backend contributions (yes, just the backend and not the core code) took
at least *five* rounds of reviews in one case, and after *seven* rounds
in another case it is still not right, despite the publicly available
TC2 implementation to serve as a reference.
I'm sure each time a new patch set was posted, their authors honestly
believed their code was correct. Otherwise why would they post buggy
code?
Now you are telling me that they should have put that code into firmware
instead? Can you realize what a catastrophe this would have been? Are
you _seriously_ believing that they would be up to their 5th firmware
revision by now? And that updating their firmware six months after
product launch would be as easy as updating the kernel?
Software ALWAYS has bugs, whether it is user apps, the kernel, firmware
or boot ROM. The bigger one of those parts is, the more bugs it will
have. And the cost to vendors for fixing those bugs grow exponentially
down each level. For proof, we're now considering possible workarounds
in the kernel to sidestep the difficulty with updating the firmware on a
Chromebook.
Yet you're saying that firmware should grow code with the same
complexity as the MCPM core, plus a machine specific backend that
experience has proven multiple times is evidently hard to get right,
into firmware because running Linux in secure mode is wrong? If so we
don't live in the same world indeed.
The day I see a firmware architecture that allows for 1) the same level
of peer review as what we enjoy with the Linux kernel code and 2) the
same ability to perform updates in the field as the kernel, then maybe I
could be sold on the many advantages having generic firmware might have.
In the meantime I consider complex firmware as a very suboptimal
architecture with no bearing on the reality of actual short-cycled
products, and if they prevail we'd better be ready to pile more of those
ugly hacks in the kernel.
Nicolas
More information about the linux-arm-kernel
mailing list