Problems booting exynos5420 with >1 CPU
Nicolas Pitre
nicolas.pitre at linaro.org
Sun Jun 8 10:53:43 PDT 2014
On Sun, 8 Jun 2014, Lorenzo Pieralisi wrote:
> On Sun, Jun 08, 2014 at 12:53:34AM +0100, Olof Johansson wrote:
> > Lorenzo,
> >
> > Since you're emailing from @arm.com, some of this is to the wider
> > recipient and maybe not directly to you:
>
> I am glad to reply and take blame since this is a debate definitely worth
> having.
Great. Because I would like to steer this debate a little towards the
genuine cause rather than sticking to some particular consequences.
> Guys, do not get me wrong here. There are fixes that can be deemed
> acceptable in an OS, there are fixes that can't. I just can't help thinking
> that Nicolas' patch is a nasty hack (and I am far, really really far from
> blaming him for that, because that's the only patch that can fix that
> issue in the kernel), and he perfectly knows that.
You know what? The more I think about my patch, the more I consider
this should be the standard way of setting up things unconditionally on
_all_ platforms using MCPM. Why? Because that's the most coherent thing
to do!
I really think the kernel should either be responsible for the CCI or it
should not at all. And conversely for the bootloader. Right now we
have an implicit requirement that the bootloader should turn on the CCI,
but only for cold boot, and only for the boot cluster, and not for CPU
resuming from idle, and what other case we haven't thought about yet.
And as noticed this requirement is not documented.
> > > Whatever the outcome of this thread, a booting protocol update for CCI
> > > is in order, even if we have to debate it for 6 months or more to get
> > > an agreement.
> >
And in the end I don't think the CCI should have to be documented as a
boot requirement. Instead of having firmware implementers understand
when they should care about the CCI and when they shouldn't, I'd much
prefer if they hadn't to care at all. I really prefer when
responsibility for something is well encapsulated in one place and not
shared across layers like the firmware or the kernel depending on some
context. The complexity of a system (and therefore the probability for
bugs) grows with the square of the number of interrelations between
constituent parts. So we should always strive to make the boot protocol
_simpler_ not more complex.
And if complete responsibility for the CCI in the kernel had been
assumed from the beginning, we wouldn't be struggling in this power play
to determine which side should give in. Especially since the kernel has
all the necessary infrastructure to do it all already, and I must say in
a rather elegant manner.
> > I'm a very strong proponent of enabling upstream support for our
> > platform (for several reasons -- most of these are actually business
> > reasons for us, but also because it's the right thing to do). Finding
> > the trade-off for what workarounds are still reasonable to do in the
> > kernel for that situation is obviously hard and we're disagreeing. But
> > the scope for these workarounds is not large.
Will people ever realize that, if the kernel was more in control of the
hardware (isn't that the role of an OS kernel to serve as the hardware
abstraction layer?) then we wouldn't be talking about "workarounds" but
rather "standard fixes"?
> > In this case, the change we're looking at is enabling the CCI port for
> > the boot cpu. It's perfectly containable in exynos-only code, and we
> > can surround it by however many comments of never ever using it as an
> > example for how to do it as you'll want.
In this case, to state my opinion clearly, it is the general design that
was flawed and the kernel should be fixed to enable the CCI for the boot
CPU itself _when_ it knows it is going to need it. To start with, the
bootloader has no need what so ever for using more than one CPU, unless
it wants to become an operating system, so it shouldn't have to care at
all. The kernel, if booted without the CCI information in the DTB, will
run with only one CPU and won't rely on the CCI. Logically the CCI
could be left turned off in that case, possibly increasing bus
performance and saving some power.
> I agree with what you are saying, but if for any reason someone will
> copy that code to paper over yet another firmware quirk and think that's
> the right thing to do, that would be grave IMHO.
Someone shouldn't have to copy that code because I'm getting more and
more convinced it should be made generic and unconditional, and by doing
so removing any possibility for firmware to get that part wrong again.
According to my quick experiment on TC2, this took only 271 microseconds
to perform so this is not like if that would make a significant
difference in boot time.
> > > I do not think they do. The kernel should not become a place where firmware
> > > bugs are fixed, if you refuse to fix the bug and this code does not get
> > > upstream I am pretty sure next time more attention will be paid.
Again, this is coming about because firmware is a MAGNITUDE harder to
fix and IMPOSSIBLE to be bug free, just like any other software. So if I
may get back to the genuine cause for this debate: this came about
because of TOO MUCH firmware code and encouraging people to create more
of it is *BAD*.
Sure, in the server world you are likely to want firmware and standards
because that helps bringing maintenance costs down. But server
equipment has much longer life cycles than mobile devices and somewhat
less aggressive and complex power management to perform. Firmware for
servers may take *time* to be developed, tested, certified, etc. To
illustrate this, we've been working on UEFI and ACPI for a period tat
can be measured in years at this point. So, hopefully by the time
server oriented firmware is available, it would be well tested and
relied upon for a long time.
none of the above applies to consumer products with fast development and
short life cycles.
> I understand your point, and I do not want to stop people from using
> this platform with upstream code, actually I am the first who is happy
> to see power management code getting in the mainline, but not at all costs,
> because this has consequences for US.
And those consequences are?
> ARM are pushing for open trusted firmware, ARM TRMs are available to
> partners with those sequences described, and I have always been willing
> to support developers.
Ahhhh... Here we are. "ARM are pushing for open trusted firmware ..."
> We should do more, but that does not justify these bugs, really.
Bugs are never justifiable, but they happen _all_ the time.
Firmware is a MAGNITUDE harder to fix, and IMPOSSIBLE to be bug free
just like any other software.
> > > Where do we draw the line, that's my point.
> >
> > You draw the line by giving vendors a place to do the nasty stuff that
> > needs to be done in a place that doesn't impact others, and where
> > others don't have to look. Quirk tables, fixup functions, or function
> > pointers that can be replaced on a specific platform if needed. When
> > it affects core code, you sort it out in a different way if you have
> > to.
Again this is missing the point. No line would need to be drawn if the
core code was responsible in the first place. DMC parameters are
conceptually so trivial that no one should normally mess that up, and
the firmware must do it just so that memory is usable. So there is no
choice but to do that in firmware. It is a completely different story
with complex operations which should never ever be relegated to
firmware.
> > Maybe it's just me, but I didn't use to see this disconnected puritan
> > world view from people until DT came along. I don't think it's DTs
> > fault, but I think the requirements of DT-as-ABI has tainted the
> > mindset of many developers in a way that they treat everything as
> > needing to be polished to a perfect shine in all aspects, all the
> > time.
>
> Olof, it is not puritanism, it is all about upstreaming code. If we
> keep accepting these hacks and we end up with mach code full of them
> we have a problem, do you agree ?
Absolutely!
So once again, let's take a step back, open our eyes and look at the
fundamental reason why hacks are there, and how they could fundamentally
be avoided. And no, hoping for fewer bugs in firmware is not realistic
if people are encouraged to create more of it.
> > Expecting things to be perfect from day one is not realistic.
>
> I do not buy this I am sorry. Fair enough, CCI is a new concept, but
> SMP power management has been implemented in older platforms with
> the same requirements, nothing new and still people are getting this
> wrong.
Lorenzo: what you say is not exact. People screwed SMP power management
in the past for sure. And they still will because requirement are
changing all the time they're not the same. Maybe requirements are
somewhat stable in the server space, but in the mobile space they're
not. So this must be implemented where it is cheapest to fix.
> > > Nicolas: it is not a matter of PSCI vs. MCPM, firmware vs. the kernel,
> > > that's a debate worth having, not now.
Why not? I'm saying that too much firmware is a fundamental design
mistake for consumer products. All the rest falls off from that. Why
not addressing the source of the problem rather than constantly
suffering and debating its consequences?
Again I want to clearly state that I have nothing against PSCI the
interface spec despite the appearances. I've reviewed its draft and
provided comments, etc. My point is, when taking a step back, we may
only conclude that more firmware does not create a better system overall
because of real life costs and constraints associated to it. So PSCI is
not the problem, the problem is at another conceptual level.
> > > Adding these hacks has serious maintainance consequences (eg CPUidle
> > > code) and that's the main reason I jumped into this discussion.
Sorry, I don't see the connection.
> > > Let me reiterate my point: it is not a kernel vs firmware debate,
But of *course* it is, unless you're too invested in your firmware
strategy to be able to see all the downsides.
> > > it is about clean and maintainable code vs hackish and
> > > unmaintainable code in the kernel.
No argument there. Unfortunately, hackish code comes about because of
broken firmware in most cases. Kernel code can be cleaned at any moment
but in practice firmware code cannot.
> > No, it's about having code that runs in the real world, versus some
> > random framework that doesn't actually fill a useful purpose since
> > nobody can make use of it without a bunch of out-of-tree code.
>
> PSCI is not a random framework, it is a standard and it runs in real
> world platforms and would hide all these HW quirks where they belong.
Which real world platforms? I'm curious.
> > Wow, you're going to be really, really frustrated over how the world
> > will start to look with all the "standardized" closed firmware
> > platforms and their quirks and bug workarounds we'll have to add in
> > the kernel.
>
> Yes, and I will shout even louder when that will happen =)
That _will_ happen. Such is life. And you'll have only yourself to blame
because you pushed for bigger firmware to be created in the first place.
Nicolas
More information about the linux-arm-kernel
mailing list