Initial Congestion Windows in Linux

As part of my research I’ve spent a lot of time looking the performance of TCP variants and options. One of the most common questions I get asked is about the congestion window and how it reacts to change in the environment.

The congestion window (CWND) is used to control how many segments (layer 4 protocol data unit) can be outstanding (unacknowledged) at any point in time. For most TCP connections we want to be able to use as much bandwidth as we can without overwhelming the network. In most situations the CWND constantly changing as we move through the phases of the connection lifecycle.

The first phase of a TCP connection is the 3 way handshake where a connection is established between two endpoints (client and server). When the connection is established both endpoints individually set buffers for the sending (CWND) and receive (RWND) of data. These buffers are usually set conservatively for efficiency and security purposes.

The next phase is slow start, where the CWND is set as an integer between 1 and 10 (maximum currently allowed per RFC6928). Slow start (despite its name) happens exponentially using a method similar to ABC (Appropriate Byte Counting) where the CWND is increased by the number of bytes acknowledged in an Ack segment. As bandwidth has increased it makes sense, it makes sense to increase that initial CWND to 10. Fortunately newer Linux kernels do just this.

As the connection matures and exits slow start (depending on the flavour of TCP could be a combination of loss and latency), TCP moves into congestion avoidance where the CWND is only increased approximately every round trip. Loss, Latency, and ECN (Explicit Congestion Notification) may result in a return to slow start.

Unlike the receive window (which we can see in packets) we can’t directly see the congestion window in a packet capture. We can however infer the congestion window based on the number of outstanding bytes any point in time.

The above IO Graph in Wireshark looks at a 10 second trace that shows shows the number of bytes in flight which roughly equates to the congestion window over the lifecycle of the connection.

I can also zoom into the beginning of the trace and count the number of segments at the very beginning of the connection and I can see that I can send a maximum of 14480 bytes, or 10 segments at 1448 bytes (+12 bytes of TCP options to add up to the MSS of 1460).

1. Look at the CWND of a listening sockets using ‘ss — nli | grep cwnd | uniq’

    2. Write a simple application that inspects the socket created.

    This application takes 2 parameters (destination host and port) and it creates a TCP connection, not transfer any data, and print out the socket parameters. The tcpi_snd_cwnd in this example shows the expected value: 10.

    import socket
    import string
    import struct
    import argparse
    
    class tcp_client_tcp_info(object):
        """Simple tcp client."""
    
        def __init__(self, host, port):
            try:
                client_socket = socket.socket()
                client_socket.connect((host, port))
                fmt = "B"*8+"I"*24
                fmt_keys =  ['tcpi_state', 'tcpi_ca_state', 'tcpi_retransmits', 'tcpi_probes', 'tcpi_backoff', 'tcpi_options', 'tcpi_snd_wscale', 'tcpi_rcv_wscale', 'tcpi_rto', 'tcpi_ato', 'tcpi_snd_mss', 'tcpi_rcv_mss', 'tcpi_unacked', 'tcpi_sacked', 'tcpi_lost', 'tcpi_retrans', 'tcpi_fackets', 'tcpi_last_data_sent', 'tcpi_last_ack_sent', 'tcpi_last_data_recv', 'tcpi_last_ack_recv', 'tcpi_pmtu', 'tcpi_rcv_ssthresh', 'tcpi_rtt', 'tcpi_rttvar', 'tcpi_snd_ssthresh', 'tcpi_snd_cwnd', 'tcpi_advmss', 'tcpi_reordering', 'tcpi_rcv_rtt', 'tcpi_rcv_space', 'tcpi_total_retrans']
                tcp_info = dict(zip(fmt_keys, struct.unpack(fmt, client_socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 104))))
                for k, v in tcp_info.items():
                    print(k + " " + str(v))
            except socket.error as e:
                print(e)
                exit(1)
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument("-H", "--hostname", type=str,
                        help="Host to connect to", required=True)
        parser.add_argument("-p", "--port", type=int,
                        help="Port to connect to", required=True)
        args = parser.parse_args()
        tcp_client_tcp_info(args.hostname, args.port)

    Lastly we can check out where this value is set tcp.h and the commit that set it to 10.

    If you got this far, thanks for reading. Pick up a copy of W. Richard Stevens TCP/IP Illustrated Volume 1 to learn more about TCP and the protocols that built the internet.


    Posted

    in

    , ,

    by

    Tags:

    Comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *