tunnel upstream is unusably slow

Ortwin Glück odi at odi.ch
Wed Apr 2 09:46:18 EDT 2014



On 02.04.2014 13:14, David Woodhouse wrote:
> On Wed, 2014-04-02 at 12:55 +0200, Ortwin Glück wrote:
>> Also strace shows that the write() by scp returns EAGAIN very often even
>> though the selector gets a signal that the fd is ready for writing:
> 
> Thanks for the report.
> 
> It would be useful if you were to strace openconnect itself, and see
> precisely what it's doing.

Here are its fds:
total 0
lr-x------ 1 root root 64 Apr  2 15:04 0 -> /tmp/sh-thd-1396418206 (deleted)
lrwx------ 1 root root 64 Apr  2 15:04 1 -> /dev/pts/1
lrwx------ 1 root root 64 Apr  2 10:38 2 -> /dev/pts/1
lr-x------ 1 root root 64 Apr  2 15:04 3 -> pipe:[61426]
l-wx------ 1 root root 64 Apr  2 15:04 4 -> pipe:[61426]
lrwx------ 1 root root 64 Apr  2 15:04 5 -> socket:[61427]
lrwx------ 1 root root 64 Apr  2 15:04 6 -> socket:[64637]
lrwx------ 1 root root 64 Apr  2 15:04 7 -> /dev/net/tun

6: https connection
7: tun

$ strace -ttfp 29785

15:07:32.901346 select(8, [3 6 7], [], [6], {29, 999616}) = 1 (in [7],
left {29, 781601})
-> data avail in tun
-> loop starts here:
15:07:33.119660 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
-> unnecessary read from SSL, no data to write to SSL
15:07:33.119787 read(7,
"E\10\5\0b\340@\0@\6\17\201\n\363X\236\n\271UE\222:\0\26\37O\3\26\315\21q\33"...,
1280) = 1280
15:07:33.119882 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
-> tun data read, no data to write
15:07:33.119999 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
-> poll control fd
15:07:33.120106 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
-> unnecessary read from SSL
15:07:33.120370 write(6,
"\27\3\1\0\30i\332\2\352j\203\262'\202\233;k\223\37/\1|b\314\247\202\313\234E\27\3\1"...,
1346) = 1346
-> write to SSL
15:07:33.120570 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
-> tries to read tun again, no write
15:07:33.120651 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
-> poll control fd
15:07:33.120741 select(8, [3 6 7], [], [6], {29, 0}) = 1 (in [6], left
{28, 990785})
-> data avail in SSL
15:07:33.130092 read(6, "\27\3\1\0X", 5) = 5
15:07:33.130181 read(6,
"\312\r\246\304\32\317N\16\371\315;\373k\320\205\203\266v\207\342l\216\270T\302\314vW\362L\22\0"...,
88) = 88
15:07:33.130313 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
-> data read from SSL
15:07:33.130385 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
-> no data in tun
15:07:33.130439 write(7,
"E\10\0004\245\273@\0:\6\327q\n\271UE\n\363X\236\0\26\222:\315\21q\33\37O\7\342"...,
52) = 52
-> write data to tun
15:07:33.130526 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
-> poll control fd
15:07:33.130585 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.130640 read(7, 0x1e05e58, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.130688 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
15:07:33.130745 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [7], left
{29, 783272})
===> HUGE of 217ms delay here
15:07:33.347681 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.347806 read(7,
"E\10\5\0b\343@\0@\6\17~\n\363X\236\n\271UE\222:\0\26\37O\7\342\315\21q\33"...,
1280) = 1280
15:07:33.347954 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.348041 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
15:07:33.348134 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.348378 write(6,
"\27\3\1\0\30R\351^\226\332Y\n*\300\r\354Oc\334kb%\205(\223gB\257\260\27\3\1"...,
1346) = 1346
15:07:33.348600 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.348708 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
15:07:33.348796 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [6], left
{29, 991582})
15:07:33.357292 read(6, "\27\3\1\0X", 5) = 5
15:07:33.357329 read(6,
"\3\263\36V\220$u\6N\352\0203\312\323\330\r+\27.g\5\256\20\0\341\325E*\316Khc"...,
88) = 88
15:07:33.357390 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.357425 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.357453 write(7,
"E\10\0004\245\274@\0:\6\327p\n\271UE\n\363X\236\0\26\222:\315\21q\33\37O\f\256"...,
52) = 52
15:07:33.357508 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
15:07:33.357543 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.357575 read(7, 0x1e08cb8, 1280) = -1 EAGAIN (Resource
temporarily unavailable)
15:07:33.357602 select(4, [3], NULL, NULL, {0, 0}) = 0 (Timeout)
15:07:33.357634 select(8, [3 6 7], [], [6], {30, 0}) = 1 (in [7], left
{29, 782110})
==> HUGE delay of 218ms here
15:07:33.575758 read(6, 0x1dfc973, 5)   = -1 EAGAIN (Resource
temporarily unavailable)

So it looks like openconnect is just waiting for packets to become
available on the socket. I have straced the scp/ssh process pair better
(scp just writes to a pipe to ssh). ssh's socket is its fd 3:

15:36:01.141172 select(7, [3], [3], NULL, NULL) = 1 (out [3])
15:36:01.533392 write(3,
"\t\302\273\311\25F\253\361\321\335^9\241]\v\212>1\203\23\300\313(\325\267\331~\360\216:\235\22"...,
162620) = 14736
15:36:01.533536 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0
15:36:01.533640 select(7, [3], [3], NULL, NULL) = 1 (out [3])
15:36:01.569605 write(3,
"\323\237l\"\221E\341\362\217\366\200\354[\314a\277\364Q\300\177\4\200Ty\217\33r\26PJP\354"...,
147884) = 17192
15:36:01.569988 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0
15:36:01.570081 select(7, [3 4], [3], NULL, NULL) = 1 (in [4])
15:36:01.570201 read(4,
"\367\1774\262t\246\330\310\2777A^\261|B\320\203\361\375@\365\345\351~\263\23\364\207\232@\373\311"...,
16384) = 16384
15:36:01.570372 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0
15:36:01.570467 select(7, [3 4], [3], NULL, NULL) = 1 (in [4])
15:36:01.570596 read(4,
"\251\254\223\363\274G\233\270\306D\367\36\177\231\256G\345\273\214\2371\177e\350\357\7\313HO\353\355\322"...,
16384) = 16384
15:36:01.570740 mremap(0x7f7077b41000, 856064, 888832, MREMAP_MAYMOVE) =
0x7f7077b41000
15:36:01.570796 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0
15:36:01.570845 select(7, [3], [3], NULL, NULL) = 1 (out [3])
15:36:03.485027 write(3,
"\v\374\315nl\225\371\350\2065\253\3326b\27\375\27\2\217[P\217\231s\263I~Q\321VW\237"...,
163508) = 14736
15:36:03.485214 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [324480], [4]) = 0
15:36:03.485314 select(7, [3], [3], NULL, NULL) = 1 (out [3])

Seems to behave like it should with several seconds between select/write
and writes in chunks of 14KB.

Question remains: why does openconnect's select block for so long? A lot
of data should be already available in the socket buffer.

> If you could show the tcpdump both from the
> tunnel interface *and* the private network between client and server
> while this is happening, it would also be enlightening.

Here you go. I have replaced the actual IPs with names for convenience
and privacy. Timestamps are seconds since previous packet. Note the
0.214s delays before local starts sending again after last ACK.

No.     Time     Source                Destination           Protocol
Length Info
   1100 0.214    tun-local             tun-remote            SSHv2
1280   Encrypted request packet len=1228
   1101 0.000    SSL-local             SSL-remote            TLSv1
1126   Application Data
   1102 0.000    SSL-local             SSL-remote            TLSv1
328    Application Data
   1103 0.007    SSL-remote            SSL-local             TCP      60
    443 > 54547 [ACK] Seq=19758 Ack=204341 Win=32768 Len=0
   1104 0.001    SSL-remote            SSL-local             TLSv1
147    Application Data
   1105 0.000    SSL-local             SSL-remote            TCP      54
    54547 > 443 [ACK] Seq=204615 Ack=19851 Win=65535 Len=0
   1106 0.000    tun-remote            tun-local             TCP      52
    22 > 37376 [ACK] Seq=6189 Ack=184836 Win=64128 Len=0
TSval=3122435631 TSecr=13382833
   1107 0.214    tun-local             tun-remote            SSHv2
1280   Encrypted request packet len=1228
   1108 0.000    SSL-local             SSL-remote            TLSv1
1126   Application Data
   1109 0.000    SSL-local             SSL-remote            TLSv1
328    Application Data
   1110 0.007    SSL-remote            SSL-local             TCP      60
    443 > 54547 [ACK] Seq=19851 Ack=205687 Win=32768 Len=0
   1111 0.001    SSL-remote            SSL-local             TLSv1
147    Application Data
   1112 0.000    SSL-local             SSL-remote            TCP      54
    54547 > 443 [ACK] Seq=205961 Ack=19944 Win=65535 Len=0
   1113 0.000    tun-remote            tun-local             TCP      52
    22 > 37376 [ACK] Seq=6189 Ack=186064 Win=64128 Len=0
TSval=3122435855 TSecr=13382889
   1114 0.214    tun-local             tun-remote            SSHv2
1280   Encrypted request packet len=1228


> But I don't think it's the short-term solution to
> your problem.

Seems right.

Ortwin




More information about the openconnect-devel mailing list