I already send this via the UT2003 Linux mailing list, but because it
seems to have been overlooked there, I'm posting this here (a bit longer):
It seems as if the UT2k3 Webadmin is still suffering from the same problem
that already plagued UT servers since kernel 2.4.x...
The only thing that "fixes" this right now seems to be a local port redirection
(e.g. by using netcat), but this does hardly qualify as a solution..
During the last two days I tested several things that were proposed, for example
on the Infogrames Serveradmin Forum:
- putting the AdminName + AdminPassword in the command line fixed the login,
but nothing else
- setting the server to a fixed IP by using -multihome=xxx in the command line
didn't change anything except that I had to changed the port redirection from
localhost to the external IP ;)
- various other settings that I tried (ServerName, MaxConnections,..)
also had no visible effect
After that I put in a little debugging session, which led to some interesting
results:
Only full-featured Browsers seem unable to use the Webadmin (I tested with
Netscape 4, Opera and IE, same result), but I could start Lynx from any
machine and use the Webadmin without any problems (well, admittedly,
no frames then ;) )
The other browsers usually just broke off loading the webadmin after confirming
the login and password for the webadmin.
By using a packet sniffer (Ethereal) I captured the HTTP traffic between the
browsers
and the UT webserver and compared it with a equivalent query on the UT
windows version (which worked flawlessly).
Interestingly, the server responded as it should in Linux as in Windows, so
I suspect the real problem does not lie in the handling of external and internal
traffic as suggested by several people and by the fact that it works through a
port redirector. The only difference I was able to determine both versions
was that the linux server often sent an extra packet between the HTTP status
reply and the content, which contained some bytes of garbled data.
Let me illustrate this by some tcp dumps from the communication:
(CLIENT being my machine, 62.4.65.115 the ut2k3 server, not in the
same network but at my colo..)
No. Time Source Destination Protocol Info
1 0.000000 CLIENT 62.4.65.115 TCP 4411 >
20012 [SYN] Seq=1307072141 Ack=0 Win=16384 Len=0
Client tries to open connection to server.
2 0.037097 62.4.65.115 CLIENT TCP 20012 >
4411 [SYN, ACK] Seq=311328002 Ack=1307072142 Win=5840
Len=0
Server acknowledges connection.
3 0.037244 CLIENT 62.4.65.115 TCP 4411 >
20012 [ACK] Seq=1307072142 Ack=311328003 Win=16944 Len=0
Connection now established.
4 0.037900 CLIENT 62.4.65.115 TCP 4411 >
20012 [PSH, ACK] Seq=1307072142 Ack=311328003 Win=16944
Len=496
This is the GET request from the Client, Data is:
0000 47 45 54 20 2f 53 65 72 76 65 72 41 64 6d 69 6e GET /ServerAdmin
0010 2f 20 48 54 54 50 2f 31 2e 31 0d 0a 55 73 65 72 / HTTP/1.1..User
0020 2d 41 67 65 6e 74 3a 20 4d 6f 7a 69 6c 6c 61 2f -Agent: Mozilla/
0030 34 2e 30 20 28 63 6f 6d 70 61 74 69 62 6c 65 3b 4.0 (compatible;
0040 20 4d 53 49 45 20 35 2e 30 3b 20 57 69 6e 64 6f MSIE 5.0; Windo
0050 77 73 20 32 30 30 30 29 20 4f 70 65 72 61 20 36 ws 2000)
[..shortened for obvious reasons..]
5 0.106733 62.4.65.115 CLIENT TCP 20012 >
4411 [ACK] Seq=311328003 Ack=1307072638 Win=6432 Len=0
Packet has arrived, server acks that.
6 0.196073 62.4.65.115 CLIENT TCP 20012 >
4411 [PSH, ACK] Seq=311328003 Ack=1307072638 Win=6432
Len=17
First part of the server answer:
0000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK.
0010 0a .
So this is just the HTTP answer, further header data plus content should
follow..
7 0.295727 62.4.65.115 CLIENT TCP 20012 >
4411 [RST, ACK] Seq=311328713 Ack=1307072638 Win=6432
Len=0
Now this one's the prize.. How comes the server sends a packet with the RST
flag set ? Since this will cause a connection reset, it really shouldn't do
that.. If
you look into the hexdump, you will see a garbled copy of the HTTP reply
(header included):
0x0000 00 E0 7D 7D 17 1A 00 80 AD 30 83 E2 08 00 45 00 ..}}.....0....E.
0x0010 00 28 D7 39 40 00 38 06 2B 6B 3E 04 41 73 C0 A8 .(.9@.8.+k>.As..
0x0020 00 0C 4E 2C 11 3B 12 8E 7F C9 4D E8 58 7E 50 14 ..N,.;....M.X~P.
0x0030 19 20 BE 5F 00 00 48 54 54 50 2F 31 . ._..HTTP/1
9 0.305493 62.4.65.115 CLIENT TCP 20012 >
4411 [FIN, PSH, ACK] Seq=311328020 Ack=1307072638
Win=6432 Len=692
0000 53 65 72 76 65 72 3a 20 55 6e 72 65 61 6c 45 6e Server: UnrealEn
0010 67 69 6e 65 20 55 57 65 62 20 57 65 62 20 53 65 gine UWeb Web Se
0020 72 76 65 72 20 42 75 69 6c 64 20 31 30 37 37 0d rver Build 1077.
0030 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 .Content-Type: t
0040 65 78 74 2f 68 74 6d 6c 0d 0a 43 6f 6e 6e 65 63 ext/html..Connec
0050 74 69 6f 6e 3a 20 43 6c 6f 73 65 0d 0a 0d 0a 3c tion: Close....<
0060 68 74 6d 6c 3e 0d 0a 20 3c 68 65 61 64 3e 3c 54 html>.. <head><T
0070 49 54 4c 45 3e 55 6e 72 65 61 6c 20 54 6f 75 72 ITLE>Unreal Tour
0080 6e 61 6d 65 6e 74 20 32 30 30 33 20 2d 20 57 65 nament 2003 - We
0090 62 20 41 64 6d 69 6e 3c 2f 54 49 54 4c 45 3e 0d b Admin</TITLE>.
[.....]
This is the missing content, containing the remaining HTTP header fields as
well as the
HTML data of the page. But since the server has already sent a RST, the client
won't
get this, because to him the connection is already closed..
10 0.305641 CLIENT 62.4.65.115 TCP 4411 >
20012 [RST] Seq=1307072638 Ack=1307072638 Win=0 Len=0
The client acknowledges the connection reset -> connection abnormally closed,
so the
browser won't take the packet with the real data and just show a blank page.
Here is what a "well-behaved" UT server on windows does when the client does
the same thing:
1 5.992718 CLIENT 62.4.65.115 TCP 4430 >
20012 [SYN] Seq=1398341698 Ack=0 Win=16384 Len=0
2 6.037179 62.4.65.115 CLIENT TCP 20012 >
4430 [SYN, ACK] Seq=2951964545 Ack=1398341699
Win=16944 Len=0
3 6.037324 CLIENT 62.4.65.115 TCP 4430 >
20012 [ACK] Seq=1398341699 Ack=2951964546 Win=16944
Len=0
4 6.037990 CLIENT 62.4.65.115 TCP 4430 >
20012 [PSH, ACK] Seq=1398341699 Ack=2951964546
Win=16944 Len=498
..same as above
5 6.167200 62.4.65.115 CLIENT TCP 20012 >
4430 [PSH, ACK] Seq=2951964546 Ack=1398342197
Win=16446 Len=17
..this is the HTTP 200 OK
6 6.221103 62.4.65.115 CLIENT TCP 20012 >
4430 [FIN, PSH, ACK] Seq=2951964563 Ack=1398342197
Win=16446 Len=692
..and the rest of the data, no additional package here. FIN flag is set, so
connection will be closed.
7 6.221272 CLIENT 62.4.65.115 TCP 4430 >
20012 [ACK] Seq=1398342197 Ack=2951965256 Win=16235
Len=0
..client acknowledges data and FIN, so connection is closed normally.
Now as I see it, this behaviour of the linux server causes most browsers to
shut the connection before
the data arrives, which most definitely is desired. Ryan, could you please
verify (maybe it's just me ?) this
and/or check this in the webserver's netcode whether there's some condition
which might cause the
connection reset ?
Ah yes, the linux was running SuSE 7.3 on a Dual-P3 1Ghz with 1 gig of ram,
uname -a gives:
Linux tp115 2.4.10-64GB-SMP #1 SMP Mon Nov 5 11:33:30 GMT 2001 i686 unknown.
(Same problem occurs on Suse 8.0 w/ kernel 2.4.18 on a Dual Athlon machine)
-- Teclador
teclador@teamplay.de
Ok some test results:
Failed ( broken / missing pages / images )
kernel 2.4.0 with IE 5.5
Fine
kernel 2.2.2.14 with IE 5.5
kernel 2.2.2.14 with IE 6.0 SP1
kernel 2.4.0 with IE 5.5
kernel 2.4.0 with IE 6.0 SP1
kernel 2.4.19 with IE 6.0 SP1
So there is two solutions
1. Use IE 6.0 SP1 ( none SP1 may also work not tested )
2. Use kernel 2.4.19 ( versions above 2.4.16 may also be fine )
N.B. 2.2 kernel generates:
CheckConnectionQueue: Failed to accept queued connection: 11
Close: Error while attempting to close socket.
When a page is being served an the browser disconnects
I think I have made some real progress with this issue.
I've setup some c code which does what I think your
web code is doing and it replicates the behaviour we
are seeing. i.e. fine under 2.2 broken under 2.4
##############
// setup socket
// Accept connection
struct linger li;
li.l_onoff = 1;
li.l_linger = 0;
if (setsockopt(client->client_socket, SOL_SOCKET,
SO_LINGER, (char *) &li,sizeof(struct linger)) < 0) {
fprintf( stderr, "setsockopt: (SO_LINGER) (%s)\n",
strerror( errno ) );
}
// send response ( unauthorised )
// send response ( authenticate )
shutdown( client->client_socket, 2 );
close( client->client_socket );
###############
With under 2.4:
li.l_onoff = 1;
li.l_linger = 0;
I get:
Connection setup:
gromit.2477 > steven2.8080: S 1909252052:1909252052(0)
steven2.8080 > gromit.2477: S 1642848390:1642848390(0) ack 1909252053
gromit.2477 > steven2.8080: . ack 1
Request:
gromit.2477 > steven2.8080: P 1:186(185) ack 1
Request ACK:
steven2.8080 > gromit.2477: . ack 186
Response ( Unauthorised )
steven2.8080 > gromit.2477: P 1:28(27) ack 186
Response ( WWW-authenticate )
steven2.8080 > gromit.2477: FP 28:264(236) ack 186
Response ( Unauthorised ) ACK
gromit.2477 > steven2.8080: . ack 265
Reset:
steven2.8080 > gromit.2477: R 265:265(0) ack 186
steven2.8080 > gromit.2477: R 1642848655:1642848655(0)
Now the client sees the Reset and aborts causing the broken
images / pages.
Under 2.2 SO_LINGER with:
li.l_onoff = 1;
li.l_linger = 0;
appears to get ignored ( incorrect behaviour ) either way it never
sends the Reset.
If this does mirror your code the solution is to set l_linger > 0
Hope this helps
Steven: That is some impressive remote debugging. That seems to be exactly
what's going on here.
I am genuinely impressed.
And to verify this:
http://groups.google.com/groups?q=SO_LINGER&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=oup4rsie8zt.fsf%40pigdrop.muc.suse.de&rnum=6
Let me check for ramifications (specifically, does the close() call block on a
lingering socket, or does the call return and the kernel deals with tossing the
remaining packets behind the scenes? If it blocks, we've got a problem) and
patch this appropriately.
Great work. I really mean that.
--ryan.
Now thats a good question. As far as I remember the correct implementation of
SO_LINGER is the kernel should do it async but it is common for some tcp stacks
to implement it syncronously hence it will block if until the server reads the
FIN2 from the client.
A good doc about this issue is:
http://httpd.apache.org/docs/misc/perf-tuning.html
See the 'Lingering Close' section.
It does indicate that linux may be one of the stacks which does this correctly
but seems like quite a few people use *BSD ( me included :) ) so it would be
worth checking on linux and say FreeBSD.
If you want me to do anymore digging / testing shout.
Just to keep you posted, we talked about this, and it is feasible to deal with
this correctly (basically shutdown() the socket and read until all the data is
sent before closing, and everything gets to where it should be and we never
block in a syscall). Jack's going to patch this on win32, and I'm going to
verify the winsock calls translate correctly to BSD sockets (which they should).
Look for it in the patch (and open a new bug after the next patch if it still
happens). For now, I'm flagging this bug as "FIXED".
Thanks again to everyone that did the real debugging on this one. :)
--ryan.
I already send this via the UT2003 Linux mailing list, but because it seems to have been overlooked there, I'm posting this here (a bit longer): It seems as if the UT2k3 Webadmin is still suffering from the same problem that already plagued UT servers since kernel 2.4.x... The only thing that "fixes" this right now seems to be a local port redirection (e.g. by using netcat), but this does hardly qualify as a solution.. During the last two days I tested several things that were proposed, for example on the Infogrames Serveradmin Forum: - putting the AdminName + AdminPassword in the command line fixed the login, but nothing else - setting the server to a fixed IP by using -multihome=xxx in the command line didn't change anything except that I had to changed the port redirection from localhost to the external IP ;) - various other settings that I tried (ServerName, MaxConnections,..) also had no visible effect After that I put in a little debugging session, which led to some interesting results: Only full-featured Browsers seem unable to use the Webadmin (I tested with Netscape 4, Opera and IE, same result), but I could start Lynx from any machine and use the Webadmin without any problems (well, admittedly, no frames then ;) ) The other browsers usually just broke off loading the webadmin after confirming the login and password for the webadmin. By using a packet sniffer (Ethereal) I captured the HTTP traffic between the browsers and the UT webserver and compared it with a equivalent query on the UT windows version (which worked flawlessly). Interestingly, the server responded as it should in Linux as in Windows, so I suspect the real problem does not lie in the handling of external and internal traffic as suggested by several people and by the fact that it works through a port redirector. The only difference I was able to determine both versions was that the linux server often sent an extra packet between the HTTP status reply and the content, which contained some bytes of garbled data. Let me illustrate this by some tcp dumps from the communication: (CLIENT being my machine, 62.4.65.115 the ut2k3 server, not in the same network but at my colo..) No. Time Source Destination Protocol Info 1 0.000000 CLIENT 62.4.65.115 TCP 4411 > 20012 [SYN] Seq=1307072141 Ack=0 Win=16384 Len=0 Client tries to open connection to server. 2 0.037097 62.4.65.115 CLIENT TCP 20012 > 4411 [SYN, ACK] Seq=311328002 Ack=1307072142 Win=5840 Len=0 Server acknowledges connection. 3 0.037244 CLIENT 62.4.65.115 TCP 4411 > 20012 [ACK] Seq=1307072142 Ack=311328003 Win=16944 Len=0 Connection now established. 4 0.037900 CLIENT 62.4.65.115 TCP 4411 > 20012 [PSH, ACK] Seq=1307072142 Ack=311328003 Win=16944 Len=496 This is the GET request from the Client, Data is: 0000 47 45 54 20 2f 53 65 72 76 65 72 41 64 6d 69 6e GET /ServerAdmin 0010 2f 20 48 54 54 50 2f 31 2e 31 0d 0a 55 73 65 72 / HTTP/1.1..User 0020 2d 41 67 65 6e 74 3a 20 4d 6f 7a 69 6c 6c 61 2f -Agent: Mozilla/ 0030 34 2e 30 20 28 63 6f 6d 70 61 74 69 62 6c 65 3b 4.0 (compatible; 0040 20 4d 53 49 45 20 35 2e 30 3b 20 57 69 6e 64 6f MSIE 5.0; Windo 0050 77 73 20 32 30 30 30 29 20 4f 70 65 72 61 20 36 ws 2000) [..shortened for obvious reasons..] 5 0.106733 62.4.65.115 CLIENT TCP 20012 > 4411 [ACK] Seq=311328003 Ack=1307072638 Win=6432 Len=0 Packet has arrived, server acks that. 6 0.196073 62.4.65.115 CLIENT TCP 20012 > 4411 [PSH, ACK] Seq=311328003 Ack=1307072638 Win=6432 Len=17 First part of the server answer: 0000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK. 0010 0a . So this is just the HTTP answer, further header data plus content should follow.. 7 0.295727 62.4.65.115 CLIENT TCP 20012 > 4411 [RST, ACK] Seq=311328713 Ack=1307072638 Win=6432 Len=0 Now this one's the prize.. How comes the server sends a packet with the RST flag set ? Since this will cause a connection reset, it really shouldn't do that.. If you look into the hexdump, you will see a garbled copy of the HTTP reply (header included): 0x0000 00 E0 7D 7D 17 1A 00 80 AD 30 83 E2 08 00 45 00 ..}}.....0....E. 0x0010 00 28 D7 39 40 00 38 06 2B 6B 3E 04 41 73 C0 A8 .(.9@.8.+k>.As.. 0x0020 00 0C 4E 2C 11 3B 12 8E 7F C9 4D E8 58 7E 50 14 ..N,.;....M.X~P. 0x0030 19 20 BE 5F 00 00 48 54 54 50 2F 31 . ._..HTTP/1 9 0.305493 62.4.65.115 CLIENT TCP 20012 > 4411 [FIN, PSH, ACK] Seq=311328020 Ack=1307072638 Win=6432 Len=692 0000 53 65 72 76 65 72 3a 20 55 6e 72 65 61 6c 45 6e Server: UnrealEn 0010 67 69 6e 65 20 55 57 65 62 20 57 65 62 20 53 65 gine UWeb Web Se 0020 72 76 65 72 20 42 75 69 6c 64 20 31 30 37 37 0d rver Build 1077. 0030 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 .Content-Type: t 0040 65 78 74 2f 68 74 6d 6c 0d 0a 43 6f 6e 6e 65 63 ext/html..Connec 0050 74 69 6f 6e 3a 20 43 6c 6f 73 65 0d 0a 0d 0a 3c tion: Close....< 0060 68 74 6d 6c 3e 0d 0a 20 3c 68 65 61 64 3e 3c 54 html>.. <head><T 0070 49 54 4c 45 3e 55 6e 72 65 61 6c 20 54 6f 75 72 ITLE>Unreal Tour 0080 6e 61 6d 65 6e 74 20 32 30 30 33 20 2d 20 57 65 nament 2003 - We 0090 62 20 41 64 6d 69 6e 3c 2f 54 49 54 4c 45 3e 0d b Admin</TITLE>. [.....] This is the missing content, containing the remaining HTTP header fields as well as the HTML data of the page. But since the server has already sent a RST, the client won't get this, because to him the connection is already closed.. 10 0.305641 CLIENT 62.4.65.115 TCP 4411 > 20012 [RST] Seq=1307072638 Ack=1307072638 Win=0 Len=0 The client acknowledges the connection reset -> connection abnormally closed, so the browser won't take the packet with the real data and just show a blank page. Here is what a "well-behaved" UT server on windows does when the client does the same thing: 1 5.992718 CLIENT 62.4.65.115 TCP 4430 > 20012 [SYN] Seq=1398341698 Ack=0 Win=16384 Len=0 2 6.037179 62.4.65.115 CLIENT TCP 20012 > 4430 [SYN, ACK] Seq=2951964545 Ack=1398341699 Win=16944 Len=0 3 6.037324 CLIENT 62.4.65.115 TCP 4430 > 20012 [ACK] Seq=1398341699 Ack=2951964546 Win=16944 Len=0 4 6.037990 CLIENT 62.4.65.115 TCP 4430 > 20012 [PSH, ACK] Seq=1398341699 Ack=2951964546 Win=16944 Len=498 ..same as above 5 6.167200 62.4.65.115 CLIENT TCP 20012 > 4430 [PSH, ACK] Seq=2951964546 Ack=1398342197 Win=16446 Len=17 ..this is the HTTP 200 OK 6 6.221103 62.4.65.115 CLIENT TCP 20012 > 4430 [FIN, PSH, ACK] Seq=2951964563 Ack=1398342197 Win=16446 Len=692 ..and the rest of the data, no additional package here. FIN flag is set, so connection will be closed. 7 6.221272 CLIENT 62.4.65.115 TCP 4430 > 20012 [ACK] Seq=1398342197 Ack=2951965256 Win=16235 Len=0 ..client acknowledges data and FIN, so connection is closed normally. Now as I see it, this behaviour of the linux server causes most browsers to shut the connection before the data arrives, which most definitely is desired. Ryan, could you please verify (maybe it's just me ?) this and/or check this in the webserver's netcode whether there's some condition which might cause the connection reset ? Ah yes, the linux was running SuSE 7.3 on a Dual-P3 1Ghz with 1 gig of ram, uname -a gives: Linux tp115 2.4.10-64GB-SMP #1 SMP Mon Nov 5 11:33:30 GMT 2001 i686 unknown. (Same problem occurs on Suse 8.0 w/ kernel 2.4.18 on a Dual Athlon machine) -- Teclador teclador@teamplay.deI think I have made some real progress with this issue. I've setup some c code which does what I think your web code is doing and it replicates the behaviour we are seeing. i.e. fine under 2.2 broken under 2.4 ############## // setup socket // Accept connection struct linger li; li.l_onoff = 1; li.l_linger = 0; if (setsockopt(client->client_socket, SOL_SOCKET, SO_LINGER, (char *) &li,sizeof(struct linger)) < 0) { fprintf( stderr, "setsockopt: (SO_LINGER) (%s)\n", strerror( errno ) ); } // send response ( unauthorised ) // send response ( authenticate ) shutdown( client->client_socket, 2 ); close( client->client_socket ); ############### With under 2.4: li.l_onoff = 1; li.l_linger = 0; I get: Connection setup: gromit.2477 > steven2.8080: S 1909252052:1909252052(0) steven2.8080 > gromit.2477: S 1642848390:1642848390(0) ack 1909252053 gromit.2477 > steven2.8080: . ack 1 Request: gromit.2477 > steven2.8080: P 1:186(185) ack 1 Request ACK: steven2.8080 > gromit.2477: . ack 186 Response ( Unauthorised ) steven2.8080 > gromit.2477: P 1:28(27) ack 186 Response ( WWW-authenticate ) steven2.8080 > gromit.2477: FP 28:264(236) ack 186 Response ( Unauthorised ) ACK gromit.2477 > steven2.8080: . ack 265 Reset: steven2.8080 > gromit.2477: R 265:265(0) ack 186 steven2.8080 > gromit.2477: R 1642848655:1642848655(0) Now the client sees the Reset and aborts causing the broken images / pages. Under 2.2 SO_LINGER with: li.l_onoff = 1; li.l_linger = 0; appears to get ignored ( incorrect behaviour ) either way it never sends the Reset. If this does mirror your code the solution is to set l_linger > 0 Hope this helps