linux returns EAGAIN for closed ocserv interfaces

Nikos Mavrogiannopoulos nmav at gnutls.org
Fri Sep 26 04:39:37 PDT 2014


On Fri, 2014-09-26 at 11:30 +0100, David Woodhouse wrote:
> On Sun, 2014-09-21 at 02:00 +0200, Nikos Mavrogiannopoulos wrote:
> > On Sat, 2014-09-20 at 13:05 +0200, Niels Peen wrote:
> > > Another possible clue: I upgraded from ocserv 0.3 to 0.8 on September 15th. 0.3 has never caused this problem.
> > 
> > I don't see much changes related to that issue from 0.3 to 0.8, but that
> > looks like a race issue in the kernel and could be caused by different
> > timings between calls. I've applied the suggested fix anyway as it looks
> > correct.
> 
> It looks very very wrong to me. Linus had it right at
> https://lkml.org/lkml/2002/7/17/165
> 
> The close() system call should *never* fail to close the file
> descriptor. And as Linus points out, your force_close() hack is very
> broken in a threaded environment.

That doesn't matter much for ocserv as there are no multiple threads. It
was added as it looked reasonable for other OSes which may not behave as
Linux.

> Niels seemed to suggest that the client had gone away, which implies
> that there's no ocserv thread servicing the tun device at all. Is that
> the case? Is the device in question still *up*? It probably shouldn't
> be. (You do have a separate tundev per client?)
> 
> I'd like to see more information about the state of the system when this
> failure happens. Reproducing with dnsmasq seems like it would be hard
> because to hit the race condition you have to disconnect the VPN after
> sending a query but before receiving the reply. I suppose you could hack
> openconnect to disconnect after sending a DNS request :)  Or use a
> different UDP sender on the server side.

I'd really love to solve that issue, as I also don't believe that the
force_close() is responsible for the solution.

> (Has anyone been running VoIP over ocserv connections, btw? This talk of
> buffers and EAGAIN reminds me that we need to make sure we avoid
> excessive buffering. cf.
> http://git.infradead.org/users/dwmw2/openconnect.git/commitdiff/3444f811
> )

I am. One can tune it using the output-buffer parameter. Since you
brought that, I suspect that this particular commit must have been the
responsible for the asymmetry in upload/download (it was on an old
thread, with openconnect upload being 4 times slower than download).
Unfortunately I have no longer the hardware to verify that theory.

regards,
Nikos





More information about the openconnect-devel mailing list