Bug 107 - Webadmin not working, possible cause inside
Status: RESOLVED FIXED
Alias: None
Product: Unreal Tournament 2003
Classification: Unclassified
Component: WebAdmin
Version: Demo (Build 1077)
Hardware: PC Linux
: P2 normal
Assignee: Ryan C. Gordon
QA Contact:
URL:
: 95 102
Depends on:
Blocks:
 
Reported: 2002-09-17 12:12 EDT by Christian Lauterbach
Modified: 2002-10-19 12:10:42 EDT
4 users (show)

See Also:



Description Christian Lauterbach 2002-09-17 12:12:49 EDT
I already send this via the UT2003 Linux mailing list, but because it 
seems to have been overlooked there, I'm posting this here (a bit longer):


It seems as if the UT2k3 Webadmin is still suffering from the same problem
that already plagued UT servers since kernel 2.4.x...

The only thing that "fixes" this right now seems to be a local port redirection
(e.g. by using netcat), but this does hardly qualify as a solution..
During the last two days I tested several things that were proposed, for example
on the Infogrames Serveradmin Forum:

- putting the AdminName + AdminPassword in the command line fixed the login,
but nothing else

- setting the server to a fixed IP by using -multihome=xxx in the command line
  didn't change anything except that I had to changed the port redirection from
  localhost to the external IP ;)

- various other settings that I tried (ServerName, MaxConnections,..)
  also had no visible effect

After that I put in a little debugging session, which led to some interesting 
results:

Only full-featured Browsers seem unable to use the Webadmin (I tested with
Netscape 4, Opera and IE, same result), but I could start Lynx from any
machine and use the Webadmin without any problems (well, admittedly,
no frames then ;) )
The other browsers usually just broke off loading the webadmin after confirming
the login and password for the webadmin.

By using a packet sniffer (Ethereal) I captured the HTTP traffic between the 
browsers
and the UT webserver and compared it with a equivalent query on the UT
windows version (which worked flawlessly).
Interestingly, the server responded as it should in Linux as in Windows, so
I suspect the real problem does not lie in the handling of external and internal
traffic as suggested by several people and by the fact that it works through a
port redirector. The only difference I was able to determine both versions
was that the linux server often sent an extra packet between the HTTP status
reply and the content, which contained some bytes of garbled data.
Let me illustrate this by some tcp dumps from the communication:
(CLIENT being my machine, 62.4.65.115 the ut2k3 server, not in the
 same network but at my colo..)

No. Time        Source                Destination           Protocol Info
  1 0.000000    CLIENT                62.4.65.115           TCP      4411 > 
20012 [SYN] Seq=1307072141 Ack=0 Win=16384 Len=0

Client tries to open connection to server.

  2 0.037097    62.4.65.115           CLIENT                TCP      20012 > 
4411 [SYN, ACK] Seq=311328002 Ack=1307072142 Win=5840
Len=0

Server acknowledges connection.

  3 0.037244    CLIENT                62.4.65.115           TCP      4411 > 
20012 [ACK] Seq=1307072142 Ack=311328003 Win=16944 Len=0

Connection now established.

  4 0.037900    CLIENT                62.4.65.115           TCP      4411 > 
20012 [PSH, ACK] Seq=1307072142 Ack=311328003 Win=16944
Len=496

This is the GET request from the Client, Data is:
0000  47 45 54 20 2f 53 65 72 76 65 72 41 64 6d 69 6e   GET /ServerAdmin
0010  2f 20 48 54 54 50 2f 31 2e 31 0d 0a 55 73 65 72   / HTTP/1.1..User
0020  2d 41 67 65 6e 74 3a 20 4d 6f 7a 69 6c 6c 61 2f   -Agent: Mozilla/
0030  34 2e 30 20 28 63 6f 6d 70 61 74 69 62 6c 65 3b   4.0 (compatible;
0040  20 4d 53 49 45 20 35 2e 30 3b 20 57 69 6e 64 6f    MSIE 5.0; Windo
0050  77 73 20 32 30 30 30 29 20 4f 70 65 72 61 20 36   ws 2000)
[..shortened for obvious reasons..]

  5 0.106733    62.4.65.115           CLIENT                TCP      20012 > 
4411 [ACK] Seq=311328003 Ack=1307072638 Win=6432 Len=0

Packet has arrived, server acks that.

  6 0.196073    62.4.65.115           CLIENT                TCP      20012 > 
4411 [PSH, ACK] Seq=311328003 Ack=1307072638 Win=6432
Len=17

First part of the server answer:

0000  48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d   HTTP/1.1 200 OK.
0010  0a                                                .

So this is just the HTTP answer, further header data plus content should 
follow..

  7 0.295727    62.4.65.115           CLIENT                TCP      20012 > 
4411 [RST, ACK] Seq=311328713 Ack=1307072638 Win=6432
Len=0

Now this one's the prize.. How comes the server sends a packet with the RST
flag set ? Since this will cause a connection reset, it really shouldn't do 
that.. If
you look into the hexdump, you will see a garbled copy of the HTTP reply 
(header included):

0x0000  00 E0 7D 7D 17 1A 00 80 AD 30 83 E2 08 00 45 00   ..}}.....0....E.
0x0010  00 28 D7 39 40 00 38 06 2B 6B 3E 04 41 73 C0 A8   .(.9@.8.+k>.As..
0x0020  00 0C 4E 2C 11 3B 12 8E 7F C9 4D E8 58 7E 50 14   ..N,.;....M.X~P.
0x0030  19 20 BE 5F 00 00 48 54 54 50 2F 31               . ._..HTTP/1


  9 0.305493    62.4.65.115           CLIENT                TCP      20012 > 
4411 [FIN, PSH, ACK] Seq=311328020 Ack=1307072638
Win=6432 Len=692

0000  53 65 72 76 65 72 3a 20 55 6e 72 65 61 6c 45 6e   Server: UnrealEn
0010  67 69 6e 65 20 55 57 65 62 20 57 65 62 20 53 65   gine UWeb Web Se
0020  72 76 65 72 20 42 75 69 6c 64 20 31 30 37 37 0d   rver Build 1077.
0030  0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74   .Content-Type: t
0040  65 78 74 2f 68 74 6d 6c 0d 0a 43 6f 6e 6e 65 63   ext/html..Connec
0050  74 69 6f 6e 3a 20 43 6c 6f 73 65 0d 0a 0d 0a 3c   tion: Close....<
0060  68 74 6d 6c 3e 0d 0a 20 3c 68 65 61 64 3e 3c 54   html>.. <head><T
0070  49 54 4c 45 3e 55 6e 72 65 61 6c 20 54 6f 75 72   ITLE>Unreal Tour
0080  6e 61 6d 65 6e 74 20 32 30 30 33 20 2d 20 57 65   nament 2003 - We
0090  62 20 41 64 6d 69 6e 3c 2f 54 49 54 4c 45 3e 0d   b Admin</TITLE>.
[.....]

This is the missing content, containing the remaining HTTP header fields as 
well as the
HTML data of the page. But since the server has already sent a RST, the client 
won't
get this, because to him the connection is already closed..

 10 0.305641    CLIENT                62.4.65.115           TCP      4411 > 
20012 [RST] Seq=1307072638 Ack=1307072638 Win=0 Len=0

The client acknowledges the connection reset -> connection abnormally closed, 
so the
browser won't take the packet with the real data and just show a blank page.


Here is what a "well-behaved" UT server on windows does when the client does 
the same thing:

1 5.992718    CLIENT                 62.4.65.115             TCP      4430 > 
20012 [SYN] Seq=1398341698 Ack=0 Win=16384 Len=0
2 6.037179    62.4.65.115             CLIENT                 TCP      20012 > 
4430 [SYN, ACK] Seq=2951964545 Ack=1398341699
Win=16944 Len=0
3 6.037324    CLIENT                 62.4.65.115             TCP      4430 > 
20012 [ACK] Seq=1398341699 Ack=2951964546 Win=16944
Len=0
4 6.037990    CLIENT                 62.4.65.115             TCP      4430 > 
20012 [PSH, ACK] Seq=1398341699 Ack=2951964546
Win=16944 Len=498

..same as above

5 6.167200    62.4.65.115             CLIENT                 TCP      20012 > 
4430 [PSH, ACK] Seq=2951964546 Ack=1398342197
Win=16446 Len=17

..this is the HTTP 200 OK

6 6.221103    62.4.65.115             CLIENT                 TCP      20012 > 
4430 [FIN, PSH, ACK] Seq=2951964563 Ack=1398342197
Win=16446 Len=692

..and the rest of the data, no additional package here. FIN flag is set, so 
connection will be closed.

7 6.221272    CLIENT                 62.4.65.115             TCP      4430 > 
20012 [ACK] Seq=1398342197 Ack=2951965256 Win=16235
Len=0

..client acknowledges data and FIN, so connection is closed normally.



Now as I see it, this behaviour of the linux server causes most browsers to 
shut the connection before
the data arrives, which most definitely is desired. Ryan, could you please 
verify (maybe it's just me ?) this
and/or check  this in the webserver's netcode whether there's some condition 
which might cause the
connection reset ?

Ah yes, the linux was running SuSE 7.3 on a Dual-P3 1Ghz with 1 gig of ram, 
uname -a gives:
Linux tp115 2.4.10-64GB-SMP #1 SMP Mon Nov 5 11:33:30 GMT 2001 i686 unknown.
(Same problem occurs on Suse 8.0 w/ kernel 2.4.18 on a Dual Athlon machine)

 -- Teclador
    teclador@teamplay.de
Comment 1 Steven Hartland 2002-09-19 06:19:23 EDT
I wonder if tcpdump -tt will show the offending packets timestamp jumping back? 
If so setting setsockopt(SO_LINGER) may solve this.
Comment 2 Ryan C. Gordon 2002-09-19 15:17:16 EDT
*** Bug 95 has been marked as a duplicate of this bug. ***
Comment 3 Ryan C. Gordon 2002-09-19 18:10:07 EDT
*** Bug 102 has been marked as a duplicate of this bug. ***
Comment 4 Steven Hartland 2002-09-20 08:31:03 EDT
Ok some test results:
Failed ( broken / missing pages / images )
kernel 2.4.0 with IE 5.5

Fine
kernel 2.2.2.14 with IE 5.5
kernel 2.2.2.14 with IE 6.0 SP1
kernel 2.4.0 with IE 5.5
kernel 2.4.0 with IE 6.0 SP1
kernel 2.4.19 with IE 6.0 SP1

So there is two solutions
1. Use IE 6.0 SP1 ( none SP1 may also work not tested )
2. Use kernel 2.4.19 ( versions above 2.4.16 may also be fine )

N.B. 2.2 kernel generates:
CheckConnectionQueue: Failed to accept queued connection: 11
Close: Error while attempting to close socket.
When a page is being served an the browser disconnects
Comment 5 Steven Hartland 2002-09-20 22:21:11 EDT
I think I have made some real progress with this issue.

I've setup some c code which does what I think your
web code is doing and it replicates the behaviour we
are seeing. i.e. fine under 2.2 broken under 2.4


##############
// setup socket
// Accept connection
struct linger li;
li.l_onoff = 1;
li.l_linger = 0;

if (setsockopt(client->client_socket, SOL_SOCKET,
    SO_LINGER, (char *) &li,sizeof(struct linger)) <  0) {
    fprintf( stderr, "setsockopt: (SO_LINGER) (%s)\n",
        strerror( errno ) );
}
// send response ( unauthorised )
// send response ( authenticate )
shutdown( client->client_socket, 2 );
close( client->client_socket );
###############

With under 2.4:
li.l_onoff = 1;
li.l_linger = 0;

I get:

Connection setup:
gromit.2477 > steven2.8080: S 1909252052:1909252052(0)
steven2.8080 > gromit.2477: S 1642848390:1642848390(0) ack 1909252053
gromit.2477 > steven2.8080: . ack 1
Request:
gromit.2477 > steven2.8080: P 1:186(185) ack 1
Request ACK:
steven2.8080 > gromit.2477: . ack 186
Response ( Unauthorised )
steven2.8080 > gromit.2477: P 1:28(27) ack 186
Response ( WWW-authenticate )
steven2.8080 > gromit.2477: FP 28:264(236) ack 186
Response ( Unauthorised ) ACK
gromit.2477 > steven2.8080: . ack 265
Reset:
steven2.8080 > gromit.2477: R 265:265(0) ack 186
steven2.8080 > gromit.2477: R 1642848655:1642848655(0)

Now the client sees the Reset and aborts causing the broken
images / pages.
Under 2.2 SO_LINGER with:
li.l_onoff = 1;
li.l_linger = 0;
appears to get ignored ( incorrect behaviour ) either way it never
sends the Reset.

If this does mirror your code the solution is to set l_linger > 0

Hope this helps
Comment 6 Ryan C. Gordon 2002-09-21 01:42:47 EDT
Steven: That is some impressive remote debugging. That seems to be exactly
what's going on here.

I am genuinely impressed.

And to verify this:
http://groups.google.com/groups?q=SO_LINGER&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=oup4rsie8zt.fsf%40pigdrop.muc.suse.de&rnum=6


Let me check for ramifications (specifically, does the close() call block on a
lingering socket, or does the call return and the kernel deals with tossing the
remaining packets behind the scenes? If it blocks, we've got a problem) and
patch this appropriately.

Great work. I really mean that.

--ryan.
Comment 7 Steven Hartland 2002-09-21 08:05:46 EDT
Now thats a good question. As far as I remember the correct implementation of 
SO_LINGER is the kernel should do it async but it is common for some tcp stacks 
to implement it syncronously hence it will block if until the server reads the 
FIN2 from the client.
A good doc about this issue is:
http://httpd.apache.org/docs/misc/perf-tuning.html
See the 'Lingering Close' section.
It does indicate that linux may be one of the stacks which does this correctly 
but seems like quite a few people use *BSD ( me included :) ) so it would be 
worth checking on linux and say FreeBSD.
If you want me to do anymore digging / testing shout.
Comment 8 Ryan C. Gordon 2002-09-27 08:17:43 EDT
Just to keep you posted, we talked about this, and it is feasible to deal with
this correctly (basically shutdown() the socket and read until all the data is
sent before closing, and everything gets to where it should be and we never
block in a syscall). Jack's going to patch this on win32, and I'm going to
verify the winsock calls translate correctly to BSD sockets (which they should).
Look for it in the patch (and open a new bug after the next patch if it still
happens). For now, I'm flagging this bug as "FIXED".

Thanks again to everyone that did the real debugging on this one.  :)

--ryan.