Iptables: matching outgoing traffic with conntrack and owner. Works with strange drops

All we need is an easy explanation of the problem, so here it is.

In my iptables script I have been experimenting with writing as finely grained rules as possible. I limit which users are allowed to use which services, partly for security and partly as a learning exercise.

Using iptables v1.4.16.2 on Debian 6.0.6 running the 3.6.2 kernel.

However I’ve hit an issue I don’t quite understand yet.. .

outgoing ports for all users

This works perfectly fine. I do not have any generic state tracking rules.

## Outgoing port 81
$IPTABLES -A OUTPUT -p tcp --dport 81 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT
$IPTABLES -A INPUT -p tcp --sport 81 -s $MYIP -m conntrack --ctstate ESTABLISHED -j ACCEPT

outgoing ports with user matching

## outgoing port 80 for useraccount
$IPTABLES -A OUTPUT --match owner --uid-owner useraccount -p tcp --dport 80 -m conntrack --ctstate NEW,ESTABLISHED --sport 1024:65535 -j ACCEPT
$IPTABLES -A INPUT -p tcp --sport 80 --dport 1024:65535 -d $MYIP -m conntrack --ctstate ESTABLISHED -j ACCEPT

This allows port 80 out only for the account “useraccount”, but rules like this for TCP traffic have issues.

## Default outgoing log + block rules
$IPTABLES -A OUTPUT -j LOG --log-prefix "BAD OUTGOING " --log-ip-options --log-tcp-options --log-uid

The Issue

The above works, the user “useraccount” can get files perfectly fine. No other users on the system can make outgoing connections to port 80.

[email protected]:$ wget http://cachefly.cachefly.net/10mb.test

But the wget above leaves x7 dropped entries in my syslog:

Oct 18 02:00:35 xxxx kernel: BAD OUTGOING IN= OUT=eth0 SRC=xx.xx.xx.xx DST= LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=12170 DF PROTO=TCP SPT=37792 DPT=80 SEQ=164520678 ACK=3997126942 WINDOW=979 RES=0x00 ACK URGP=0  

I don’t get these drops for similar rules with UDP traffic. I already have rules in place that limit which users can make DNS requests.

The dropped outgoing ACK packets seem to be coming from the root account (URGP=0) which I don’t understand. Even when I swap useraccount for root.

I believe that ACK packets are categorised as new because conntrack starts tracking connections after the 3rd step of the 3 way handshake, but why are the being dropped?

Can these drops be safely ignored?


So I often see rules like these, which work fine for me:

$IPTABLES -A OUTPUT -s $MYIP -p tcp -m tcp --dport 80 -m state --state NEW,ESTABLISHED -j ACCEPT
$IPTABLES -A INPUT -p tcp -m tcp --sport 80 -d $MYIP -m state --state ESTABLISHED -j ACCEPT

I swapped “-m state –state” for “-m conntrack –ctstate” as state match is apparently obsolete.

Is it best practice to have generic state tracking rules? Are the rules above not considered correct?

For tight control over outgoing users connections would something like this be better?

$IPTABLES -A INPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT
$IPTABLES -A OUTPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT

$IPTABLES -A OUTPUT -p tcp --dport 80 -s $SERVER_IP_TUNNEL -m conntrack --ctstate NEW -m owner --uid-owner useraccount -j ACCEPT

$IPTABLES -A OUTPUT -p tcp --dport 80 -s $SERVER_IP_TUNNEL -m conntrack --ctstate NEW -m owner --uid-owner otheraccount -j ACCEPT

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

To cut a long story short, that ACK was sent when the socket didn’t belong to anybody. Instead of allowing packets that pertain to a socket that belongs to user x, allow packets that pertain to a connection that was initiated by a socket from user x.

The longer story.

To understand the issue, it helps to understand how wget and HTTP requests work in general.


wget http://cachefly.cachefly.net/10mb.test

wget establishes a TCP connection to cachefly.cachefly.net, and once established sends a request in the HTTP protocol that says: “Please send me the content of /10mb.test (GET /10mb.test HTTP/1.1) and by the way, could you please not close the connection after you’re done (Connection: Keep-alive). The reason it does that is because in case the server replies with a redirection for a URL on the same IP address, it can reuse the connection.

Now the server can reply with either, “here comes the data you requested, beware it’s 10MB large (Content-Length: 10485760), and yes OK, I’ll leave the connection open”. Or if it doesn’t know the size of the data, “Here’s the data, sorry I can’t leave the connection open but I’ll tell when you can stop downloading the data by closing my end of the connection”.

In the URL above, we’re in the first case.

So, as soon as wget has obtained the headers for the response, it knows its job is done once it has downloaded 10MB of data.

Basically, what wget does is read the data until 10MB have been received and exit. But at that point, there’s more to be done. What about the server? It’s been told to leave the connection open.

Before exiting, wget closes (close system call) the file descriptor for the socket. Upon, the close, the system finishes acknowledging the data sent by the server and sends a FIN to say: “I won’t be sending any more data”. At that point close returns and wget exits. There is no socket associated to the TCP connection anymore (at least not one owned by any user). However it’s not finished yet. Upon receiving that FIN, the HTTP server sees end-of-file when reading the next request from the client. In HTTP, that means “no more request, I’ll close my end”. So it sends its FIN as well, to say, “I won’t be sending anything either, that connection is going away”.

Upon receiving that FIN, the client sends a “ACK”. But, at that point, wget is long gone, so that ACK is not from any user. Which is why it is blocked by your firewall. Because the server doesn’t receive the ACK, it’s going to send the FIN over and over until it gives up and you’ll see more dropped ACKs. That also means that by dropping those ACKs, you’re needlessly using resources of the server (which needs to maintain a socket in the LAST-ACK state) for quite some time.

The behavior would have been different if the client had not requested “Keep-alive” or the server had not replied with “Keep-alive”.

As already mentioned, if you’re using the connection tracker, what you want to do is let every packet in the ESTABLISHED and RELATED states through and only worry about NEW packets.

If you allow NEW packets from user x but not packets from user y, then other packets for established connections by user x will go through, and because there can’t be established connections by user y (since we’re blocking the NEW packets that would establish the connection), there will not be any packet for user y connections going through.

Method 2

This allows port 80 out only for the account “useraccount”

— well, at least the rules you’ve shown don’t imply this, actually.

There’s also a room for advice — don’t do user checking on ESTABLISHED streams, just do that checking on NEW. I also don’t see a point in checking source port when checking Incoming ESTABLISHED, what’s the difference which port it was, it’s already in ESTABLISHED state from conntrack’s PoV. Firewall should be as simple as possible but efficient yet, so Occam’s razor approach is the best fit.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply