Race between SIGIO and epoll from SMP host

Anton Ivanov anton.ivanov at kot-begemot.co.uk
Thu Apr 22 08:50:42 BST 2021


On 22/04/2021 08:32, Anton Ivanov wrote:
> On 21/04/2021 16:45, Anton Ivanov wrote:
>>
>>
>> On 21/04/2021 14:35, YiFei Zhu wrote:
>>> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
>>> <anton.ivanov at kot-begemot.co.uk> wrote:
>>>>> Considering that this is a race on the host, what would be the best
>>>>> way to fix this?
>>>>
>>>> Interesting one. I need to think.
>>>>
>>>> One option would be to wait for epoll events with a timeout which is 
>>>> larger than zero - f.e. HZ.
>>>
>>> I was about to say I could reproduce it even with a timeout of 1ms,
>>> then I realized that code I pasted above already used 1ms timeout.
>>> Assertion failures using 1ms timeout seems much rarer than 0 timeout
>>> however.
>>>
>>> For reference my CONFIG_HZ on the host is 1000. I also use
>>> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
>>> the kernel ticking works).
>>>
>>>> If we have received a SIGIO there is an epoll event on the way. The 
>>>> fact that it is not in the queue right now means that we are due to 
>>>> process it shortly.
>>
>> This seems to be limited to ttys. Why - I need to figure it out.
>>
>> If this ends up as tty specific, we can enable the work-around for 
>> ttys which was there when they were not producing sigio on write 
>> correctly.
>>
>> This ends up disabled on most modern machines, because modern kernels 
>> produce sigio on write correctly for ttys.
>>
>> With the workaround enabled there is an extra IO event which is 
>> produced after the notification appears on the poll loop in a helper 
>> thread. So the stall should never happen.
> 
> 
> I now have an idea why we see this on ttys.
> 
> TTY IO wake-up in addition to doing SIGIO before poll notifications, 
> also does poll notifications using a wake-up which will reschedule.
> 
> Compared to that, let's say socket does a sync wake-up which does not 
> reschedule and does it before SIGIO.
> 
> In either case, we stand a chance of missing an interrupt. Just in the 
> second case it is extremely small. So small that I have never seen it in 
> practice.
> 
> The real way of dealing with it will be to do to do a helper thread 
> which (e)polls the epoll fd and generates a SIGIO if there is an 
> outstanding EPOLL notification which has been missed. This would also 
> take care of the range of conditions which are currently handled by the 
> SIGIO fd helper so that would become surplus to requirements.
> 
> I think that just polling the epoll fd should do the job here. So this 
> will also get rid of all the motions needed to register fds with the 
> async helper.

In fact, we can kill the registration of fds for SIGIO too. The helper 
does the same job, so why bother?

A

> 
> Brgds,
> 
> 
>>
>> A.
>>
>>>>
>>>> A.
>>>
>>> YiFei Zhu
>>>
>>> _______________________________________________
>>> linux-um mailing list
>>> linux-um at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-um
>>>
>>
> 
> 


-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/



More information about the linux-um mailing list