Potential races in SIGIO vs Epoll order

Anton Ivanov anton.ivanov at cambridgegreys.com
Wed May 5 17:34:03 BST 2021



On 05/05/2021 15:49, YiFei Zhu wrote:
> On Tue, May 4, 2021 at 5:42 AM Anton Ivanov
> <anton.ivanov at cambridgegreys.com> wrote:
>>
>> Hi all,
>>
>> I got around to look at the race discovered by YiFei Zhu in http://lists.infradead.org/pipermail/linux-um/2021-April/001396.html
>>
>> While UML works using an epoll helper to generate SIGIO, it is ~ 5 times slower compared to enabling async IO. So this is not usable as the primary mode of operation for the interrupt controller.
>>
>> I am going to play with it a bit further and try get it to do a "epoll not serviced" check only occasionally - to handle errors and IRQ controller stalls.
>>
>> Brgds,
> 
> This might be a dumb idea (tell me if it is), but what if we still
> have a helper that runs a defferent epoll fd on the exact same set of
> fds as the main thread, and send an additional SIGIO to the main
> thread if it notices an epoll event? This way, a SIGIO is guaranteed
> to be generated after the wake queue is woken up for epoll events.
> What I don't know is if the two different epoll fds will still race in
> the wake queue, if the helper's epoll returns first and sends SIGIO to
> the main thread before the main thread's epoll receives the event...
> Hmm. This would also mean more spurious SIGIOs.

You can poll the epoll fd. It has the same effect.

I tried two approaches.

1. Do not generate SIGIO on fds at all and make the helper thread throw a
SIGIO at the main UML thread every time there is an epoll event. Bombproof,
but slow. Boot time up by a factor of 5.

2. Leave SIGIO on all fds as now and still generate SIGIOs out of the
helper thread - basically throw an extra SIGIO when the epoll becomes
active in the helper. Slow again - very high CPU usage.

There one more option I intend to try next.

We can poll the epollfd on timer events if there is no SIGIO pending in
the signal handler which serves as UML "interrupt hardware". While this
is not a guaranteed approach, there are usually enough active timers in
thesystem to guarantee a check at ~5 milliseconds on average which should
be enough for ttys. It also narrows it down to "only if there is no SIGIO
pending", so there should be no performance penalty if everything works
as it should.

Brgds,


> 
> YiFei Zhu
> 

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/



More information about the linux-um mailing list