Queries on ARM SDEI Linux kernel code
Neeraj Upadhyay
neeraju at codeaurora.org
Wed Oct 21 13:31:41 EDT 2020
Hi James,
Sorry for late reply. Thanks for your comments!
On 10/16/2020 9:57 PM, James Morse wrote:
> Hi Neeraj,
>
> On 15/10/2020 07:07, Neeraj Upadhyay wrote:
>> 1. Looks like interrupt bind interface (SDEI_1_0_FN_SDEI_INTERRUPT_BIND) is not available
>> for clients to use; can you please share information on
>> why it is not provided?
>
> There is no compelling use-case for it, and its very complex to support as the driver can
> no longer hide things like hibernate.
>
> Last time I looked, it looked like the SDEI driver would need to ask the irqchip to
> prevent modification while firmware re-configures the irq. I couldn't work out how this
> would work if the irq is in-progress on another CPU.
>
Got it. I will think in this direction, on how to achieve this.
> The reasons to use bound-interrupts can equally be supported with an event provided by
> firmware.
>
>
Ok, I will explore in that direction.
>> While trying to dig information on this, I saw that [1] says:
>> Now the hotplug callbacks save nothing, and restore the OS-view of registered/enabled.
>> This makes bound-interrupts harder to work with.
>
>> Based on this comment, the changes from v4 [2], which I could understand is, cpu down path
>> does not save the current event enable status, and we rely on the enable status
>> `event->reenable', which is set, when register/unregister, enable/disable calls are made;
>> this enable status is used during cpu up path, to decide whether to reenable the interrupt.
>
>> Does this make, bound-interrupts harder to work with? how? Can you please explain? Or
>> above save/restore is not the reason and you meant something else?
>
> If you bind a level-triggered interrupt, how does firmware know how to clear the interrupt
> from whatever is generating it?
>
> What happens if the OS can't do this either, as it needs to allocate memory, or take a
> lock, which it can't do in nmi context?
>
> Ok, makes sense.
> The people that wrote the SDEI spec's answer to this was that the handler can disable the
> event from inside the handler... and firmware will do, something, to stop the interrupt
> screaming.
>
> So now an event can become disabled anytime its registered, which makes it more
> complicated to save/restore.
>
>
>> Also, does shared bound interrupts
>
> Shared-interrupts as an NMI made me jump. But I think you mean a bound interrupt as a
> shared event. i.e. and SPI not a PPI.
>
>
Sorry I should have worded properly; yes I meant SPI as shared event.
>> also have the same problem, as save/restore behavior
>> was only for private events?
>
> See above, the problem is the event disabling itself.
>
This makes sense now.
> Additionally those changes to unregister the private-event mean the code can't tell the
> difference between cpuhp and hibernate... only hibernate additionally loses the state in
> firmware.
>
>
Got it!
>> 2. SDEI_EVENT_SIGNAL api is not provided? What is the reason for it? Its handling has the
>> same problems, which are there for bound interrupts?
>
> Its not supported as no-one showed up with a use-case.
> While firmware is expected to back it with a PPI, its doesn't have the same problems as
> bound-interrupts as its not an interrupt the OS ever knows about.
>
>
>> Also, if it is provided, clients need to register event 0 ? Vendor events or other event
>> nums are not supported, as per spec.
>
> Ideally the driver would register the event, and provide a call_on_cpu() helper to trigger
> it. This should fit in with however the GIC's PMR based NMI does its PPI based
> crash/stacktrace call so that the caller doesn't need to know if its back by IRQ, pNMI or
> SDEI.
>
>
Ok; I will explore how PMR based NMIs work; I thought it was SGI based.
But will recheck.
>> 3. Can kernel panic() be triggered from sdei event handler?
>
> Yes,
>
>
>> Is it a safe operation?
>
> panic() wipes out the machine... did you expect it to keep running?
I wanted to check the case where panic triggers kexec/kdump path into
capture kernel.
> What does safe mean here?
> I think I didn't put it correctly; I meant what possible scenarios can
happen in this case and you explained one below, thanks!
> You should probably call nmi_panic() if there is the risk that the event occurred during
> panic() on the same CPU, as it would otherwise just block.
>
>
>> The spec says, synchronous exceptions should not be triggered; I think panic
>> won't do it; but anything which triggers a WARN
>> or other sync exception in that path can cause undefined behavior. Can you share your
>> thoughts on this?
>
> What do you mean by undefined behaviour?
>
I was thinking, if SDEI event preempts EL1, at the point, where EL1 has
just entered an exception, and hasn't captured the registers like
spsr_el1, elr_el1 and other registers, what will be the behavior?
> SDEI was originally to report external abort to the OS in regions where the OS can't take
> an exception because the exception-registers are live, just after and exception and just
> before eret.
>
> If you take another exception from the NMI handler, chances are you're going to go back
> round the loop again, only this time firmware can't inject the SDEI event, so it has to
> reboot.
>
Got it.
> If you know it might cause an exception, you shouldn't do it in NMI context.
>
>
Ok, I understand now.
>> "The handler code should not enable asynchronous exceptions by clearing any of the
>> PSTATE.DAIF bits, and should not cause synchronous exceptions to the client Exception level."
>
>
> What are you using this thing for?
>
>
Usecase is, a watchdog SPI interrupt, which we want to bound to a SDEI
event. Below is the flow:
wdog expiry -> SDEI event -> HLOS panic -> trigger kexec/kdump
Thanks
Neeraj
> Thanks,
>
> James
>
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
More information about the linux-arm-kernel
mailing list