[PATCH 0/3] drm/radeon kexec fixes

Alex Deucher alexdeucher at gmail.com
Wed Sep 11 09:40:15 EDT 2013


On Wed, Sep 11, 2013 at 4:53 AM, Markus Trippelsdorf
<markus at trippelsdorf.de> wrote:
> On 2013.09.10 at 16:40 -0400, Alex Deucher wrote:
>> On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman
>> <ebiederm at xmission.com> wrote:
>> > Alex Deucher <alexdeucher at gmail.com> writes:
>> >
>> >> On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf
>> >> <markus at trippelsdorf.de> wrote:
>> >>
>> >>> IIRC Alex said the sanity checks are expensive and boot-time could be
>> >>> improved by dropping them. Maybe he can chime in?
>> >>
>> >> They shouldn't be necessary with a proper shutdown, but in this
>> >> particular case, they are not very expensive.  What is expensive is
>> >> having a separate sanity check functions for all the various hw blocks
>> >> to teardown everything on startup prior to starting it up in case
>> >> kexec, etc. left the system in a bad state.  It ends up amounting to a
>> >> full tear down sequence followed by a full start up sequence every
>> >> time you load the driver.
>> >>
>> >> I can't really comment on the first patch, but the rest seem fine.
>> >
>> > Let me reask the question just a little bit.
>> >
>> > Is it the sanity checks that are expensive?  Or is it the
>> > reinitialization that is triggered by the sanity checks that is
>> > expensive?
>> >
>> > From what Christian said in the other reply it sounds like this is a
>> > game we will never completely win, but it would be nice to have half a
>> > chance in the kexec on panic case to have video.  So I am curious to
>> > know if the checks are expensive when we are coming at hardware in a
>> > clean state.
>>
>> The particular sanity checks from this patch set are not expensive,
>> but we had previously discussed more extensive sanity checks for other
>> aspects of the chips in prior conversations.  Prior to this patch set,
>> the hw is not torn down properly (might have been in the middle of DMA
>> for example) when kexec happens.  That's why the sanity checks were
>> added in the first place.  With this patch set, the sanity checks
>> shouldn't be necessary.
>
> I think you're talking past each other.
> What Eric worries about is the »kexec on panic« case, where the shutdown
> method *isn't* called. In this case the sanity checks, that are only
> superfluous when the hardware was shutdown normally during kexec (the
> default case), may actually help. And because the checks aren't
> expensive, it wouldn't hurt to just leave them in place.

For the particular sanity check in this patch set, that's fine.  Even
with that in place, it only covers a small subset of GPUs and it
doesn't cover all possible problematic areas.  In general kexec on
panic seems like a recipe for trouble.  You may end up in the middle
of several dma transactions across multiple engines on the GPU.  We
could add code to tear down every block on the chip on start up in
case such an event happened, but like Christian said, some blocks
can't really be recovered depending on how things were ended, and then
we've doubled the start up time for the driver.  Additionally,  most
of the tear down is dependent on various driver state which we
wouldn't have at start up, so most of code would need to
re-architected so that it could be called independently of the
relevant driver state.

Alex



More information about the kexec mailing list