Bug 4258 - TCPLink functions fail to establish an outbound connection on the linux dedicated server.
Status: RESOLVED WONTFIX
Alias: None
Product: Unreal Tournament 3
Classification: Unclassified
Component: dedicated server
Version: don't know
Hardware: All Linux
: P3 blocker
Assignee: Ryan C. Gordon
QA Contact: Ryan C. Gordon
URL:
Depends on:
Blocks:
 
Reported: 2009-08-14 19:57 EDT by kernelpanic
Modified: 2013-07-23 20:20:16 EDT
2 users (show)

See Also:


Attachments
strace log, captured from about 10 seconds before TCPLink creation and up to process' death (376.09 KB, application/x-gzip)
2012-01-22 14:31 EST, da-wgh

Description kernelpanic 2009-08-14 19:57:20 EDT
<-- OVERVIEW -->

Using the TCPLink class the Linux dedicated server v3808 never establishes a successful outbound connection.

<-- STEPS TO REPRODUCE -->

Attempt ANY outbound client connection using TCPLink Open() and Event Opened().

<-- ACTUAL RESULTS -->

When attempting to use the TCPLink client (outgoing) functions in Unreal Tournament 3 the linux dedicated server will fail to connect with the following message filling you log repeatedly. 

"Log: CheckConnectionAttempt: Connection attempt has not yet completed."

Utilizing netstat i observed the ut3-bin process establish a connection and then immediately goto a close_wait state. Tcpdump does not see any actual network connection attempt.

<-- EXPECTED RESULTS -->

The expected result would be a successful outbound TCP connection.

<-- BUILD & PLATFORM INFORMATION -->

Build: Linux 2.6.9-78.0.8.ELsmp #1 SMP Wed Nov 19 19:42:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
UT3 - Init: Version: 3808

<-- ADDITIONAL INFORMATION -->

It may be prudent to confirm if the Macintosh version has this issue as well since it is based on the linux kernel.

<-- EXTRA ADDITIONAL INFORMATION -->

USCRIPT TCPLink TESTLOG:

ScriptLog: WebLink::PostBeginPlay()
ScriptLog: ReceiveMode: RMODE_Event
ScriptLog: LinkMode: MODE_Text <-- Irrelavant it seems to happen via all modes
ScriptLog: Going to resolve [dnsname]
Log: Resolved [dnsname] [ip:port]
ScriptLog: WebLink::Resolved()
ScriptLog: Binding a port
ScriptLog: Setting Addr.Port to [port]
ScriptLog: Addr.Addr: [ip]
ScriptLog: Addr.Port: [port]
ScriptLog: Opening Addr
ScriptLog: Open returned: True
Log: CheckConnectionAttempt: Connection attempt has not yet completed.
Comment 1 da-wgh 2012-01-14 14:30:46 EST
I can confirm this. Moreover, it looks like destroying TCPLink actor during vain connection attempts crashes entire game.

pure virtual method called!
this is a bug! crashing program...
Comment 2 da-wgh 2012-01-22 14:28:39 EST
This is a strace output. First column is process ID.

25914 write(2, "Log: Resolved status.torlan.ru ("..., 47) = 47
// very well
25914 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 53
// note that file descriptor is 53
25914 setsockopt(53, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
// nothing interesting
25914 bind(53, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
// bound
25914 getsockname(53, {sa_family=AF_INET, sin_port=htons(47623), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
// this system call is without side-effect, so let's ignore it
25914 fcntl64(53, F_GETFL)              = 0x2 (flags O_RDWR)
// obvious
25914 fcntl64(53, F_SETFL, O_RDWR|O_NONBLOCK) = 0
// putting socket to non-blocking mode
25914 connect(53, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("95.26.20.48")}, 16) = -1 EINPROGRESS (Operation now in progress)
// OK, non-blocking connect()
// socket is supposed to be checked for writability with select() or something similar later, but as we will see, it'll never happen

25914 select(0, NULL, NULL, [], {0, 0}) = 0 (Timeout)
25914 select(0, NULL, [], NULL, {0, 0}) = 0 (Timeout)
25914 write(2, "Log: CheckConnectionAttempt: Con"..., 71) = 71
25914 select(0, NULL, NULL, [], {0, 0}) = 0 (Timeout)
25914 select(0, NULL, [], NULL, {0, 0}) = 0 (Timeout)
// select() called with NULL/empty arguments is VERY odd
// probably *the* bug

25914 select(53, NULL, NULL, [52], {0, 0}) = 0 (Timeout)
25914 select(53, [52], NULL, NULL, {0, 0}) = 0 (Timeout)
25914 recv(52, 0xffbf0cc0, 999, 0)      = -1 ENOTCONN (Transport endpoint is not connected)
// note that this select() has *nothing* to do with our socket

25922 //more gettimeofday calls
25922 gettimeofday({1327259446, 277220}, NULL) = 0
25922 gettimeofday({1327259446, 277262}, NULL) = 0
25922 futex(0x18c69540, FUTEX_WAIT_PRIVATE, 1679, NULL <unfinished ...>
25914 <... futex resumed> )             = 1
25914 select(0, NULL, NULL, [], {0, 0}) = 0 (Timeout)
25914 select(0, NULL, [], NULL, {0, 0}) = 0 (Timeout)
25914 select(0, NULL, NULL, [], {0, 0}) = 0 (Timeout)
25914 select(0, NULL, [], NULL, {0, 0}) = 0 (Timeout)
25914 write(2, "Log: CheckConnectionAttempt: Con"..., 71) = 71
25914 select(0, NULL, NULL, [], {0, 0}) = 0 (Timeout)
25914 select(0, NULL, [], NULL, {0, 0}) = 0 (Timeout)
25914 gettimeofday({1327259446, 278080}, NULL) = 0
25914 gettimeofday({1327259446, 278884}, NULL) = 0
25914 // more gettimeofday calls
// that's how it looks like

25914 write(2, "ScriptWarning: StatusReporter WA"..., 169) = 169
25914 close(53)                         = 0
25914 write(2, "\n\n\npure virtual method called!\nt"..., 68) = 68
25914 exit_group(22)
// it's `warn() macro executed just before TCPLink actor is destroyed
// file descriptor number 53 is never mentioned anywhere else

My guess is that select() is just called with incorrect arguments.
Comment 3 da-wgh 2012-01-22 14:31:03 EST
Created attachment 3073 [details]
strace log, captured from about 10 seconds before TCPLink creation and up to process' death
Comment 4 da-wgh 2012-02-26 10:46:39 EST
Disregard the part about crash. It can be also reproduced on Windows.
Comment 5 da-wgh 2013-07-23 09:29:03 EDT
Any news on this?
Comment 6 Ryan C. Gordon 2013-07-23 20:20:16 EDT
(In reply to comment #5)
> Any news on this?

No one is working on UT3 at this point, sorry.

--ryan.