Race between SIGIO and epoll from SMP host
Anton Ivanov
anton.ivanov at kot-begemot.co.uk
Thu Apr 22 08:50:42 BST 2021
On 22/04/2021 08:32, Anton Ivanov wrote:
> On 21/04/2021 16:45, Anton Ivanov wrote:
>>
>>
>> On 21/04/2021 14:35, YiFei Zhu wrote:
>>> On Wed, Apr 21, 2021 at 7:32 AM Anton Ivanov
>>> <anton.ivanov at kot-begemot.co.uk> wrote:
>>>>> Considering that this is a race on the host, what would be the best
>>>>> way to fix this?
>>>>
>>>> Interesting one. I need to think.
>>>>
>>>> One option would be to wait for epoll events with a timeout which is
>>>> larger than zero - f.e. HZ.
>>>
>>> I was about to say I could reproduce it even with a timeout of 1ms,
>>> then I realized that code I pasted above already used 1ms timeout.
>>> Assertion failures using 1ms timeout seems much rarer than 0 timeout
>>> however.
>>>
>>> For reference my CONFIG_HZ on the host is 1000. I also use
>>> CONFIG_NO_HZ_IDLE if that's relevant (I'm not too familiar with how
>>> the kernel ticking works).
>>>
>>>> If we have received a SIGIO there is an epoll event on the way. The
>>>> fact that it is not in the queue right now means that we are due to
>>>> process it shortly.
>>
>> This seems to be limited to ttys. Why - I need to figure it out.
>>
>> If this ends up as tty specific, we can enable the work-around for
>> ttys which was there when they were not producing sigio on write
>> correctly.
>>
>> This ends up disabled on most modern machines, because modern kernels
>> produce sigio on write correctly for ttys.
>>
>> With the workaround enabled there is an extra IO event which is
>> produced after the notification appears on the poll loop in a helper
>> thread. So the stall should never happen.
>
>
> I now have an idea why we see this on ttys.
>
> TTY IO wake-up in addition to doing SIGIO before poll notifications,
> also does poll notifications using a wake-up which will reschedule.
>
> Compared to that, let's say socket does a sync wake-up which does not
> reschedule and does it before SIGIO.
>
> In either case, we stand a chance of missing an interrupt. Just in the
> second case it is extremely small. So small that I have never seen it in
> practice.
>
> The real way of dealing with it will be to do to do a helper thread
> which (e)polls the epoll fd and generates a SIGIO if there is an
> outstanding EPOLL notification which has been missed. This would also
> take care of the range of conditions which are currently handled by the
> SIGIO fd helper so that would become surplus to requirements.
>
> I think that just polling the epoll fd should do the job here. So this
> will also get rid of all the motions needed to register fds with the
> async helper.
In fact, we can kill the registration of fds for SIGIO too. The helper
does the same job, so why bother?
A
>
> Brgds,
>
>
>>
>> A.
>>
>>>>
>>>> A.
>>>
>>> YiFei Zhu
>>>
>>> _______________________________________________
>>> linux-um mailing list
>>> linux-um at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-um
>>>
>>
>
>
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
More information about the linux-um
mailing list