TCP - Observe Ability

Mastering Python Networking Review

Leigh Finch — Tue, 28 Nov 2023 02:51:31 +0000

I came across Mastering Python Networking by Eric Chou about a month ago on Twitter and immediately purchased it. I was excited to see book on programming targeted at people with a networking background as being able to automate becomes critical to scaling networks and reducing toil.

To say I’m a fan of this book is an understatement! I’d been expecting topics like programming with pexpect and using common APIs, but what we got was far more detailed than I could have hoped for with deep insights into the background of TCP right through building custom APIs, observability, and automating cloud networking.

Chou ramps the topics up gradually building on each chapter so that the learning curve for each topic is gentle enough that even someone with no Python experience could be writing basic scripts to automate network configuration changes within the first couple of chapters.

Using Ansible was covered in detail to automate configuration baselining, provisioning, and changes using a scalable methodology has been well received not only by me, but also raved about on Twitter. If you’re not using Ansible (or similar) you will eventually come across it this book gives you ready to run playbooks that will accelerate your adoption.

With my background in Observability I was pleasantly surprised that 2.5 chapters had been dedicated to the topic from multiple perspectives:

Telemetry configuration pushes
Receiving and decoding the telemetry
Extending existing tools like NTOP and Cacti

Graphing and visualisation are an important part of making data consumable to multiple audiences and an introduction and practical examples of the popular MatPlotLib and PyGraphViz were on point.

Packet decoding and crafting libraries like Scapy are introduced and again the practical examples make it easy to digest relatively complex concepts like writing a network scanning tool relatively easy. I’ve used Scapy in the past to build custom protocol implementations, and I wish I’d had this book then.

Today most of my research centres around using a Python framework called Mininet, and while Mininet is not covered, I would recommend this book to anyone looking to learn modelling and simulations using Python.

The future of networking is not Network Engineers logging into individual devices and running commands. This book is a primer for the network engineering community looking to scale, and conversely for programmers looking to understand how to automate networking tasks.

Topics:

Review of TCP/IP Protocol Suite and Python
Low-Level Network Device Interactions
APIs and Intent-Driven Networking
The Python Automation Framework – Ansible
Docker Containers for Network Engineers
Network Security with Python
Network Monitoring with Python – Part 1
Network Monitoring with Python – Part 2
Building Network Web Services with Python
Introduction to AsyncIO
AWS Cloud Networking
Azure Cloud Networking

If you enjoyed this article, pick up a copy of this book to support us.

The post Mastering Python Networking Review first appeared on Observe Ability.

XDP: Your eBPF Packet Processing Introduction!

Leigh Finch — Mon, 09 Oct 2023 04:55:34 +0000

I want to let you in on why I think XDP (eXpress Data Path) and eBPF are so awesome and will change the game when it comes to security, routing, and application delivery.

Around ten years ago a new technology called DPDK (Data Plane Development Kit) was created by Intel to enabling people like you and I to create network applications (firewalls, switches, routers, load balancers etc) in user land bypassing the hosts Kernel altogether. The benefit of this is that you are not bound by the hosts general purpose network stack. This is very cool because it allows the user to write complex packet workflows in an optimised way.

Programmability of the Linux Kernel has been a goal of eBPF (extended Berkley Packet Filter) removing the need to create tightly coupled kernel modules. Trying get a change accepted by the Linux Kernel Team and then adopted by Linux distributions. Thus eBPF can be used to create code that runs in the Kernel to observe or modifying modify behaviour in real time.

XDP provides a way to network applications to operate safely within the Kernel prior to being processed by the hosts networking stack. In the case of Cilium (an open source eBPF Kubernetes network, security, and observability platform), we can create a load-balancer that bypasses the need for kube-proxy and cloud load balancers.

Jan Engelhardt Origin SVG PNG

How is XDP Different to DPDK?

DPDK and XDP have some overlap in function around used for high-performance packet processing in network applications, however their mode of operation and capabilities are quite different.

Operation

XDP is made up of both Kernel land and user land components. Using Clang we can compile our C code targeting BPF format. We then load the code into the kernel using ip set link commands. This code can then modify the packet contents and perform 4 actions.

XDP_ABORTED – Error condition and drop the packet.
XDP_DROP – Drop the packet.
XDP_PASS – Allow the packet to continue to the kernel.
XDP_TX – Transmit the packet out the interface it was received.
XDP_REDIRECT – Transmit the packet out another interface or to a user land application leveraging AF_XDP.

In contrast DPDK requires that the NIC (Network Interface Card) supports DPDK, and will punt the traffic straight to a user land application. This means that the receiving application know how to process the packet rather than simple drop, pass, transmit style actions.

Use Cases

XDP is tightly linked to the Operating System kernel (both Linux and Windows) and is generally used in packet load balancing, observability, routing, and security applications (DDoS scrubbing, IPS, Firewall). Additionally, you can pass data to user land for the purposes of observing the data.

DPDK is generally used for NFV (Network Function Virtualisation) purposes such as creating a networking application such as WAN optimiser. The ability to have the entire application running in user land means incredible flexibility.

Performance Diagnostics Part 3 — Latency beyond Ping

Leigh Finch — Sun, 24 Sep 2023 06:14:40 +0000

I write on my personal time. Feel free to buy me a coffee or buy a copy of TCP/IP Illustrated Volume 1 to learn more about the protocols that run the internet. Check out Wireshark Network Analysis for an awesome hands-on guide to wireshark.

Network teams often use ICMP as a mechanism to determine the latency (propagation delay etc) and reachability between two endpoints using the trusty Ping utility. Ping appeared in late 1983 created Mike Muuss while working US Ballistics Research Laboratory. Additionally, what was interesting about 1983 is that it was the year the that the US military converged on IP (and TCP) mandating for any system connected to the ARPANET making Ping one of the oldest IP applications still in use today.

The naming for PING (Packet InterNet Groper) is a backronym for the sonar process used by submarines and other water-craft (as well as in nature). Which makes sense when you are trying to measure latency between nodes.

Ping uses ICMP (Internet Control Management Protocol echo(8)/echo-reply(7)) to communicate between nodes and is even mandated in the historic RFC1122 Requirements for Internet Hosts — Communication Layers(released in 1989) for internet connected hosts. This RFC is well worth a read to understand what was happening with IP and TCP in the aftermath of the congestion collapse events of the mid 1980s.

The problem with using Ping and ICMP as a measure of latency is that it is frequently blocked or placed in scavenger queues which distorts the latency detected (adding to it or making it appear jittery) anddoes not reflect the actual latency experienced by applications. The lack ICMP prioritisation makes sense, we want the actual user traffic coming through and processed at endpoints with a higher priority than our monitoring traffic. Secondly, Ping is usually run in intervals (eg. every 5 minutes) which means the that we wont be able to spot events between polling intervals.

This may have been good enough when we used IP networks for non-realtime applications (email and web browsing etc) where changes in latency and drops are not as important, but in the 2000s we started using IP SLA to inject synthetic traffic between to devices that support IP SLA and report on metrics like jitter and latency for the class of service or QoS markings desired. This was a good step further as now we understand how real traffic would perform while the IP SLA runs. This is (usually) run in intervals which means that still have gaps in our visibility. The good reason for using IP SLA (and other synthetics) is that traffic is being generated even when there is none being generated by users. A lot of vendors take this approach with their observability stacks, but it still leaves a gap between intervals and doesn’t necessarily reflect a users experience.

We can also monitor latency passively using captured TCP packets between nodes. NPM platforms like Alluvio AppResponse do this at a large scale, but we can also do this using Wireshark or TCPDump for ad-hoc reporting. The best bit is that we can now see the latency between any nodes that we can collect traffic between which has two big bennefits:

We have every connection.
It is all passive.

Using Wireshark we will look at how this is possible. I’ve talked about how TCP operates below the application layer and that an application has only a limited ability to influence socket behaviour. The OS kernel handles traffic acknowledgement, which has a very high priority in the Operating System scheduler. We can essentially ignore the TCP stack delay as negligible (unless it is behaving erratically which is a sign that the endpoint is over-subscribed).

The two TCP behaviours we will use to understand the latency of the connection are the 3-way handshake, and the TCP time to ACK two full segments.

Method 1 – The 3-way handshake

The 3-way handshake (also called connection setup time) is the process used to establish a reliable connection between two nodes. It involves the TCP client sending a specially marked segment called a SYN, The server responding with another specially marked segment called a SYN-ACK, followed by the client sending an ACK (with or without data). The delta between the SYN and the ACK collected anywhere between the nodes will give us the latency between the two nodes.

In this case we have a latency of 105ms between the SYN and the ACK. I’ve set the propagation delay of the backbone of this network to 100ms, which after we add small overhead on socket creation the server, we are very much right on the latency. I deliberately chose a capture that was was not on the client, or the server to show that this can be applied anywhere on the path.

We can also see this value of 105ms in each subsequent packet stored in the tcp.analysis.initial_rtt variable.

Method 2 — Time to ACK

We know that from RFC1122 we should see an ACK (for at least) every 2 full sized segments without delay, or after a packed marked with PSH is set. This behaviour is not impacted by the applications ability to process the data, and is solely the responsibility of the TCP stack in play. This method is best used close to the client (otherwise same additional math is required).

We can even graph this in Wireshark using the Round Trip Time option in the TCP Stream Graphs menu. You will also note some spikes in acknowledgements at and over 200ms, this is a topic willbe discussed in another article.

I like to add it as a coloumn in the packet list as below when troubleshooting TCP performance.

Using TCP to monitor latency has significant advantages over synthetics

If you made it this far, thanks for reading. Feel free to buy me a coffee or buy a copy of TCP/IP Illustrated Volume 1 to learn more about the protocols that run the internet. Check out Wireshark Network Analysis for an awesome hands-on guide to wireshark.

The post Performance Diagnostics Part 3 — Latency beyond Ping first appeared on Observe Ability.

Performance Diagnostics Part 4 -HTTPS Performance

Leigh Finch — Tue, 19 Sep 2023 05:51:48 +0000

Unlike HTTPS, analysing HTTP traffic with tools like Wireshark is pretty easy because everything is in clear text. Wireshark will even give you the request performance (49ms highlighted below). I can also see that the request was sent in packet 4 (after the three way handshake), and the response came in packet 6. The delta between packet 4 and packet 6 is your server response time.

But what about packet 5? Packet 5 is the acknowledgement of data at the operating system level, rather than at the application layer. Normally if the request is takes more than 50ms (your OS may vary), we will see what’s called a delayed acknowledgement, which the application data may piggyback on. However, this naked acknowledgement (no application payload) came back 3ms later. The reason for this is that the request was less than a full segment size (see the MSS in the SYN packets), which meant that the OS has attached the PSH flag, which the receiver must acknowledge straight away.

So what happens when we wrap this up in HTTPS? We can use the same logic as the measuring the request and response cycles we did with HTTP, it just means that we cant see the actual payloads. In most case we can expect that a payload will be sent to the server, and the delta between that payload and the return payload is our server response(1).

The second interesting thing is that we will now be at the mercy of SSL/TLS setup which involves additional round trips for the connection to establish. The below screenshot demonstrates a simple HTTPS request with connection setup, TLS establishment, HTTP, and session taredown.

If we brake this down, it’s actually quite a simple request and response cycle(2).

Events

The first 3 packets are the normal TCP three-way handshake.
Packet 4 nstead of the HTTP request, we have the ‘Client Hello’. The Client Hello is a backwards compatible offer from the client to the server to negotiate specific TLS parameters including; pre-existing TLS Session IDs, available cipher suites (eg. TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256).
Packet 5 is an operating system level TCP acknowledgement, indicating that the Client Hello has been received.
A ‘Server Hello’ is received in packet 6 with the servers selection of cipher suite and other important TLS parameters including the server TLS certificate.
Packet 10 tells the the server that all subsequent communications will be encrypted using the agreed parameters.
Packet 12 tells the client that all subsequent parameters will be encrypted using the agreed parameters.
Packet 14 is the HTTP request from the client (Encrypted using TLS).
Packet 15 (8ms later) is the beginning of the HTTP response, followed by packet packet 17. (16 is an Acknowledgement at the TCP level).
Packets 19 and 21 are the encrypted alert (type 21) which is session tear down. Even though it says alert, this is normal for TLS teardown and does not indicate a problem.
Packets 20 and 22 are normal TCP teardown.
Packets 23 and 24 are resets (RST) towards the server. Resets are now commonly used to tear down types of TLS communications(3).

From this we can see that even though the actual server response time for the request was only 8ms, it actually took 236ms to get to the beginning of the server response to the application due to TCP and TLS overhead.

If this was a high latency (eg. satellite) this would have taken even longer (back of the envelope for StarLink would take roughly 500ms, with geostationary satellite taking 2.2 seconds).

If you got this far, thanks for reading! If you want to learn more about this type of protocol analysis, pick up a copy of Wireshark Network Analysis.

The exception to this are async communications like WebSockets. This a subscribe type model where you will see the a payload go to the server, and you will see sporadic responses back to the client from the server, or a response every 60 (30, 20,10) seconds.
This session only has one HTTP object fetched. Typically you would see a persistent connection which reuses the same TCP/TLS connection to make further requests reducing overheads.
The reason for RST being used in this way is to do with the behaviour of the TCP stack specified all the way back in RFC793. If a reset is sent, the socket is closed immediately, as opposed to waiting for the 2 times MSL (Maximum Segment Lifetime) which is typically for minutes of TIME_WAIT / CLOSE_WAIT. check out RFC1337 for some interesting commentary.

The post Performance Diagnostics Part 4 -HTTPS Performance first appeared on Observe Ability.

Top 5 Wireshark Tips for Network Analysis

Leigh Finch — Fri, 15 Sep 2023 05:40:46 +0000

I’ve been using Wireshark since it was named Ethereal back in the very early 2000s, and I still use it daily for research. Wireshark recently turned 25 with creator Gerald Combs announcing it on the Wireshark blog and celebrating it at Sharkfest ‘23 Asia and US.

To celebrate I’m going to offer my top 5 Wireshark tips for 2023!

Tip 1 — Use Profiles

Wireshark is an extremely flexible application allowing you to create profiles for any use case quickly changing any preference including which columns are displayed and colouring rules. For example I have a profile that is focused on TCP sequence analysis where I display TCP specific columns such as bytes in flight and window size as well as frame delta time.

You can create clones of the default profile by right clicking on the Profile text button in the bottom right hand corner of the screen and selecting Manage Profiles.

In the Configuration Profiles window you can select the copy button to the right of the minus (-) symbol.

You can switch profiles while looking at a capture by left clicking on the Profile text in the bottom right hand corner.

This allows you to crate as many customised profiles as you need to analyse your captures without needing to constantly change your settings back and forth.

Tip 2 — Use IO Graphs

IO Graphs provide the ability to graphs and statistics on virtually any metric and display filter you can imagine. Far more flexible than stream graphs, I can not only chose the type of graph, but have multiple types in the same window when comparing bytes in flight to TCP errors.

I can filter packets using display filters, change the colour, style. Where it gets interesting is that I can change the Y axis from the number of packets to bytes, bits, or perform statistics on a specific metric in the Y column.

The Y column can be anything that can be reached using a display filter from the number of bytes in flight, to the number SMB locks.

Don’t forget to change the intervals and smooth moving average fields as needed.

Tip 3 — TCP Analysis

tcp.analysis.flags highlights anything that may be of interest to do with TCP that might be of interest, from duplicate acts, retransmissions, suspected retransmissions, errors, window changes, out-of-order, and a bunch of other great pieces of information. This gives me a visual representation of the health of TCP at a glance. I could also use Expert information as shown in tip 4.

Simply type tcp.analysis.flags in the display filter bar at the top of the screen and you will be presented with packets of interest. It’s also worth pointing out the packets vs packets displayed text at the bottom of the window to get an idea of the percentage of packets that are of interest.

Tip 3 — Expert Information

Expert information contains the collective knowledge of protocol experts around the world. Found in the Analyze menu (as well as in the packet disectors), I can find out information about how different protocols are behaving and isolate problems quickly.

Expert information is part of the packet disectors for each protocol and by default are grouped by the summary (eg. Connection reset (RST)). In this case I can see many (1395) out of order segments which may indicate multi-pathing of TCP.

Expanding the summary gives me the list of packets that match the summary, and selecting a packet allows me to navigate automatically to the packet in question in the main Wireshark window.

Tip 4 — Colouring Rules

Colouring rules enable you to create custom rules for anything that can be referenced by a display filter. This is especially useful when you identify a problem and want to be able to recognise it again quickly. An example of this might be DNS requests without responses, or in a security context a packet that is an indicator of compromise (IOS).

Tip 5 — Stream Graphs

TCP Stream Graphs allow you to visualise typical questions in time-series. I use this feature in nearly every trace I open to help me understand bottlenecks in communications. This is a Stevens Graph which shows sequence numbers over time. Note the staircase pattern, which I will go into detail in another session.

TCPTrace Graph shows a little bit more detail that the Stevens graph including SACKs (Selective Acknowledgements). The green line at the bottom represents acknowledgements, while the green line above shows the available receive window.

The throughput graph allows us to see the throughput of a connection over different moving averages. This is very useful when looking microbursts and overall efficiency. The goodput option allows us to look at the throughput perceived by the application (minus headers and retransmissions) which is especially useful for understanding protocol efficiency.

Higher throughput does not equal a better user experience.

Round trip time allows us to visualise the round trip time (or time to ack) of a connection. This is useful when looking at applications perceived latency or the latency calculation that the OS uses for measuring SRTT (hopefully with a filter for delayed ack and retransmissions ).

The window scaling option allows us to look at the receive window (not the congestion window) vs the bytes in flight. Bytes in flight reaching the receive window will result in a window full event meaning that throughput is limited by the receiver as opposed to connectivity. Bytes in flight can of course exceed the bandwidth available, as it is calculated as unacknowledged data which could include dropped (policed or shaped) traffic.

For more information on window full vs zero windows, check out my video Demystifying TCP windows.

Bonus Tips

There are far too many to narrow down to 5, but these are my favourites. Apply as column has to be my number 6 must have! I want to hear from you! Comment below with your must haves.

For a great read on how to set up Wireshark for Troubleshooting, check out Wireshark 101: Essential Skills for Network Analysis – Second Edition: Wireshark Solution Series

The post Top 5 Wireshark Tips for Network Analysis first appeared on Observe Ability.

Who’s Using My Bandwidth?

Leigh Finch — Wed, 13 Sep 2023 04:59:11 +0000

One of the questions I hate is “who’s using my bandwidth?!?” and not at all because I was the child consuming and all of the available dial-up (28.8Kbps) bandwidth downloading the latest FreeBSD or Linux distribution image. In fact this was the age of magazines with CDs that contained Mandrake, RedHat, or if I was lucky Slackware. Debian which is my current go-to wasn’t on my radar for many years later. I even recall ordering a copy of the latest FreeBSD by mail to run on one of my 386 boxes I’d collected¹. I digress…

Who’s using my bandwidth is a good question, because it implies that someone (or something) is consuming more bandwidth than they’re supposed to. If we look at the major protocols that dominate the internet being a combination of TCP, and UDP (in the form of DNS and QUIC), surely there is some fairness built into them?

The answer to the question has a short answer, and a much longer one.

Short Answer

If we look at TCP/IP and QUIC specifically they do care about fairness and have congestion control built into them to back off if they detect the presence of congestion. The challenge can be that EVERY TCP connection on every device manages its own congestion control and may mean some connections may never get to equilibrium causing some endpoints getting more than their fair share.

UDP itself has no congestion control and can blast traffic at any rate, which can flood a connection or even DoS a host if it fails throttle/coalesce CPU interrupts (e.g ethtool InterruptThrottleRate and coalescing).

Quality of Service on un-contended connections can also help with fairness. Wendell Odem and Michael J. Cavanaugh do a fantastic job of explaining this in Cisco QOS Exam Certification Guide (IP Telephony Self-Study) (Official Cert Guide) 2nd Edition. The problem with traditional QoS is that it doesn’t work on connections that have variable or contended bandwidth, which is most consumer internet connections. The reason for this is that packets need to queue in software, before they can be prioritised.

The Longer Answer

TCP has undergone radical changes over the last 40 years since RFC793 was released in September of 1981. the original RFC didn’t care about congestion, the result of which was the congestion collapse events of the mid 1980s resulting in several important changes which I outline in this video.

The major changes I’ll outline here include:

Congestion window
TCP Slow Start
Exponential backoffs

Congestion windows were introduced to back off in the event that congestion is detected using retransmission time outs. Slow Start is used at the beginning of each new connection (and after an idle period) which doubles the congestion window every round trip from an initial low value (eg. 10 x Maximum Segment Size) until congestion is detected. Exponential backoff timers were introduced to for a couple of reasons, one is to reduce the likelihood of global synchronisation, and also to give a problem time to resolve while reducing unnecessary traffic.

Most Internet TCP connections never get out of Slow Start before during their lifetime which means that they never get a chance to get to their fair share of bandwidth because they never discover the limits, and don’t cause congestion to trigger other connections to slow down.

To add to this challenge I’ve also looked at 17 TCP Congestion Control Algorithms (including the most popular CUBIC, BBR, and Reno), and most struggle to achieve equilibrium with bulk transfer traffic like iPerf.

For the congestion window to increase in either slow start or during congestion avoidance, ACKs need to return to the sender to tell the sender that data was received. This natural flow control means that TCP is self limiting (unlike UDP), AQM concepts like CoDel (Controlled Delay) can allow routers and TCP senders to effectively slow down a sender by injecting small amounts of delay to slow down TCP. this is the same idea as a receiver being overwhelmed and not acknowledging a packet quickly, resulting in the sender slowing down. I’m currently using this successfully at home with the CAKE (Common Applications Kept Enhanced) implementation on my Internet router to improve user (my family and I) experience, and reduce the impacts of bufferbloat. The beauty of CoDel is that we can target expected latency rather than just bandwidth (which in my case is variable).

If you got this far, thanks for reading!

My favourite FreeBSD OS book The Complete FreeBSD ︎

The post Who’s Using My Bandwidth? first appeared on Observe Ability.