Describing NQConnect

by **Spike** » Thu Feb 09, 2012 1:39 am

if you're looking at this purely as a connectionless connections type thing, then you need to look in the net code itself. also, you need to fix the server to report the same port that the client is already attempting to connect to. the client is fine as it is (mostly, anyway), and thus zquake's nqconnect command will not aid you.

checkforresend sends a 'ccreq_connect' request at the server (for qw it sends a getchallenge request with extra anti-smurf handshake).
the server will reply with some ccrep_connection or whatever it was called, which contains the port number the client should continue sending packets to. If we ignore the challenge handshake of quakeworld, the only real difference between the protocols is the fact that NQ sends a port number. And the problem with that is that it actually uses said port number, and said port number is not the same port as the connection request was sent to. This means that your NAT/router will see packets bound for the final port as a separate connection from the one that requested it, and thus generate a new source port too. Which confuses the server as there's now some random client somewhere on the internet sending packets from some random port that it doesn't know about. Its not the client it expects so it ignores it. And that's if your server's NAT knows to forward those other ports (which are not user visible, nor user configurable) to the game server, which is unlikely if its on a home user's lan.
As soon as you start sending IPs and port numbers, you will have NAT issues.

the netchan protocol itself is okay, that is the way that unreliable and reliable packets travel from client to server, and certainly an NQ engine needs to support fragmentation in order to cope with the large svc_serverinfo message. The problem comes from the fact that each client is polled individually and thus requires its own socket to be polled. A QuakeWorld server is different in that when it receives a (connected) packet from the socket, it loops through the current clients to see who sent it. NetQuake, on the other hand, will loop through the clients seeing which ones have any packets to be read.
Note that Tonik appears to use the same netchan API for both NQ and QW, as does FTE.
NetQuake is actually kinda similar to a TCP server... and certainly it would be less invasive to implement NQ-over-tcp than it would be to rewrite the core server code for nat fixing - accept() on your listening socket instead of creating a new socket on a random port would avoid any 'new connection' issues, assuming the client ignores the port number and uses the existing socket without reconnecting. But that's more of a side note...

Regarding ZQuake's LinkEntities, NQ and QW have quite different models for entity interpolation, or in other words QW doesn't have any. I guess Tonik found it easier to copy over NQ's LinkEntities than to rewrite the QW code to support interpolation in the same way. The lack of prediction and often lower packet rates of NQ means that you *need* full interpolation or the movement of the player's view feels like its running at 10fps.
Tonik's aproach more closely matches vanilla NQ than FTE does, in that nq entities are parsed and linked from some global array.

by **Baker** » Thu Feb 09, 2012 2:06 am

Interesting comments there.

I think I overlooked something thinking this would fix the NAT issue then. However, if I do stage 2 combing through FTEQW's source for more clues I think I'll be ok.

It didn't occur to me that the part of the NAT weakness is the server. Although ultimately I want to change that too but I have to start somewhere to get a bit of handle on this. I about 80% get the NetQuake winsock stuff now, but the Quakeworld stuff I'm a bit vague on.

by **Spike** » Thu Feb 09, 2012 4:43 am

I could try explaining how a NAT works, if you want...
Note that when I say NAT, I mean 'NAPT' or masquerading as linux refers to it.
True NAT doesn't have nearly as many issues, but its also a bit more pointless. ^^

If you're sending a packet from host A to D, through nats B and C, then the only guarenteed way for it to work is if C is set up to explicitly forward packets bound to C:26000 on to D:26000. This is a common situation with home firewalls/users.
A sends 'hey, gimme a connection' towards C, via B as a gateway. B sees a packet leaving the NAT and recognises this as a new connection. It generates a new random source port which isn't used by any other host on the lan and sends it out over the internet to host C (this means that eg 192.168.0.2 and 192.168.0.3 can both bind on the same port [this includes auto-assigned ports] and send to the same destination).
C sees a packet that appears to come from B, bound to port 26000. There's no services running on C, lets say its just a nat/firewall box (might have stuff listening on 192.168.0.1, but that's not open to the internet). So either the user has set up some port forwarding to forward 26000 on to D, or the packet gets dropped. Assuming they did set up port forwarding (if they're hosting the game behind a nat, they'll need to), then C will register this as a new connection (with a random port), update the destination address, and forward the packet on to D:26000. D receives the packet.
In this example, the packet has so far been send from A to D... It no longer has either the original source nor destination address, of course...
As this is a quake-specific example, the gameserver on host D receives the packet, understands that its a connection request from B:RNDNATB, finds a free qsocket object with a system socket bound on an automatic port (probably in the 4k-8k range if its windows) and replies to B:RNDNATB via gateway C saying 'you're accepted, by the way use port AUTOSV'. C receives the packet, checks its connectiontracker, verifies that its not a new connection, and changes the src to C:26000 before forwarding to B via the greater internet.
B receives the packet, checks its connection tracker, sees that the local side is actually A:AUTOCL (whatever port the client got autoassigned when it tried to bind to port 0), updates the destination to A:AUTOCL and forwards the packet over the lan to A.
A then receives a packet from C:26000.
Which tells it to send input packets to C:AUTOSV.
Except that C doesn't know about that port.
And it won't forward the packet. The connection will time out. (The 'workaround' is to disable the firewall/router on host C, and just forward everything to D).
Meanwhile, this is NQ... so D has already started trying to send packets to B:RNDNATB. Router C doesn't know the source address, so it creates a new outbound connection, with new random port.
Guess what...
B doesn't know it.... Connection accepted... Connection timed out. (this particular issue is fixed by the proquake nat fix - the server doesn't send until it receives a packet from the client to the correct port).

By having the server tell the client a port number, you generate issues with NATs.
FTE's NQ server uses the same socket for the client that the connection request was sent to, thus the client always sends to the same port, meaning the NATs use the same contrack entry for both the request and the connection itself, and all is well... assuming the client doesn't decide to just use a random new socket for the sake of it. As far as I'm aware, DP is identical.
QW servers and clients use a single socket for everything.

There's a few things that I've not mentioned. B:RNDNATB may be the same for all connections with the same A:PORT as a source. This is very useful to reduce the number of connections that need tracking, but for TCP its not an option, and the nat may only track connections initiated within the lan. This means that you can get routers that focus only on TCP and are suboptimal for UDP. This issue also prevents bouncing packets via a third party to punch a hole through both NATs and will ruin many techniques (eg: skype will have to find some other user's computer to use as a proxy instead).

There's also a class of really idiotic NATs that do not refresh connection timeouts after they're established, and then when a new connection is made, it generates a new random port. This basically means that your client's port number changes every 2 minutes killing your connection, which is what the qport thing is in every version of quake starting with quakeworld.

Side note:

A UDP packet contains:
IP: srcip+dstip+fraginfo(id+offset+morefrags)+payload(UDP: srcport+dstport+payload(NQGame:flagsnlen+sequence+svcdata))

Obviously the nat doesn't look into the udp's actual payload other than to send the data onwards, but will peek at the udp header.

The IP packet's payload consists of the UDP header. The UDP header is where the actual ports are stored. This means that if you have a fragmented UDP packet, *only* the first fragment contains the port numbers. This is why so many NATs have problems with fragmentation, and why you should avoid using more than 1450 bytes or whatever it is. This is even more problematic when you have routers that are unable(or refuse) to forward ICMP packets, which means your system never even knows that there's a router somewhere refusing to fragment (there's actually a minimum sane fragment size defined at around 578 bytes that every single router must be able to cope with, but generally everyone uses ethernet which is where the 1450 comes from, but beware ATM connections which fragment at 2 bytes less or so).
TCP generally depends upon ICMP messages in order to detect the MTU properly. Routers that do not forward ICMP(can be a security 'feature') can thus result in TCP connections failing to transfer data(hanging) to servers with a lower MTU beyond the router.

Additional side note:
bind(sock, &addr, sizeof(addr)) binds the socket to an address. if the address is a sockaddr_in with INADDR_ANY then it'll receive packets sent to any interface on the computer, and send packets from whatever interface the system thinks is the best (see system routing table).
If your client is unable to connect to a DP or FTE server via the host 'localhost', but can on 192.168.1.3 or whatever then your client does not support multi-homed computers (note that FTE needs sv_listen_nq 1, and possibly sv_port 26000 or it'll fail anyway. I don't remember the cvar to get DP to use vanilla protocols).
It seems to me that this is the biggest cause of connection issues discussed on quakeone.com.

Additional side note:
IPV6 can be enabled by using a hybrid ipv6 socket.
If you call: setsockopt(newsocket, IPPROTO_IPV6, IPV6_V6ONLY, (char *)&_false, sizeof(_false)) on an ipv6 socket then you get asocket that can also accept both ipv6 and ipv4 packets at the same time. It only accepts IPV6 addresses, but if you send to the address eg: ::ffff:192.168.0.1 then it'll actually send an ipv4 packet instead. Just beware that your code needs to deal exclusively in ipv6 addresses internally, but still support ipv4 addresses in the user-facing parts. Oh, you'll also need to bind to IN6ADDR_ANY or you'll not be listening to any ipv4 interfaces anyway, but that shouldn't be an issue.

by **Baker** » Mon Feb 13, 2012 7:20 am

I understand about 40% very well, maybe another 30% rather well and the remaining I'm still thinking over the implications but I'm coming to understand it. You sure outlined tons of things that I wasn't aware of (and thankfully now when I go to play this, I won't be frustrated as hell fighting some hidden factor).

Some of the code in Quakeworld --- barring the horrendous exceptions and pitfalls you've mentioned -- seems rather straightforward.

Dumb miscellaneous easy question:

Let's say you have:

1) Peer A (behind NAT)
2) Peer B (behind NAT)
3) Master C (some server with no NAT or port forwarding or whatever)

4) Peer A and Peer B are in constant communication with Master C.
5) Can Peer A and Peer B start sending each other packets and penetrate NAT eventually if Master C tells A and B to start talking?

This is kind of the "Is peer-to-peer" even possible question with 3rd server telling them to start talking to each other? If Peer A and Peer B start sending to each other, Peer A will eventually get Peer B's incoming because Peer A sent an outgoing, right?

[The reason I care is that I kinda of wonder whether or not public servers really is the way of the future for "friendly" games. Two players might just want to coop together with one acting as the server and they are not interested in anyone else playing with them. Yet everything you've told me makes it very clear that this absolutely isn't possible for just the 2 peers without router setup, which cannot reasonably be expected nor is that user friendly.]

by **Spike** » Mon Feb 13, 2012 11:06 am

http://www.brynosaurus.com/pub/net/p2pnat/ read that

Describing NQConnect

Describing NQConnect

Re: Describing NQConnect

Re: Describing NQConnect

Re: Describing NQConnect

Re: Describing NQConnect

Re: Describing NQConnect

Who is online