Race between SIGIO and epoll from SMP host

Anton Ivanov anton.ivanov at kot-begemot.co.uk
Thu Apr 22 08:32:25 BST 2021


On 21/04/2021 16:45, Anton Ivanov wrote:
> 
> 
> On 21/04/2021 14:35, YiFei Zhu wrote:
>> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
>> <anton.ivanov at kot-begemot.co.uk> wrote:
>>>> Considering that this is a race on the host, what would be the best
>>>> way to fix this?
>>>
>>> Interesting one. I need to think.
>>>
>>> One option would be to wait for epoll events with a timeout which is 
>>> larger than zero - f.e. HZ.
>>
>> I was about to say I could reproduce it even with a timeout of 1ms,
>> then I realized that code I pasted above already used 1ms timeout.
>> Assertion failures using 1ms timeout seems much rarer than 0 timeout
>> however.
>>
>> For reference my CONFIG_HZ on the host is 1000. I also use
>> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
>> the kernel ticking works).
>>
>>> If we have received a SIGIO there is an epoll event on the way. The 
>>> fact that it is not in the queue right now means that we are due to 
>>> process it shortly.
> 
> This seems to be limited to ttys. Why - I need to figure it out.
> 
> If this ends up as tty specific, we can enable the work-around for ttys 
> which was there when they were not producing sigio on write correctly.
> 
> This ends up disabled on most modern machines, because modern kernels 
> produce sigio on write correctly for ttys.
> 
> With the workaround enabled there is an extra IO event which is produced 
> after the notification appears on the poll loop in a helper thread. So 
> the stall should never happen.


I now have an idea why we see this on ttys.

TTY IO wake-up in addition to doing SIGIO before poll notifications, 
also does poll notifications using a wake-up which will reschedule.

Compared to that, let's say socket does a sync wake-up which does not 
reschedule and does it before SIGIO.

In either case, we stand a chance of missing an interrupt. Just in the 
second case it is extremely small. So small that I have never seen it in 
practice.

The real way of dealing with it will be to do to do a helper thread 
which (e)polls the epoll fd and generates a SIGIO if there is an 
outstanding EPOLL notification which has been missed. This would also 
take care of the range of conditions which are currently handled by the 
SIGIO fd helper so that would become surplus to requirements.

I think that just polling the epoll fd should do the job here. So this 
will also get rid of all the motions needed to register fds with the 
async helper.

Brgds,


> 
> A.
> 
>>>
>>> A.
>>
>> YiFei Zhu
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/



More information about the linux-um mailing list