Zero trust and measuring performance

I’ve been thinking a lot lately about how we guarantee performance in a QUIC, DTLS, and ZTNA world from a network perspective.

Check out the NIST ZTNA paper for the gory details.

Traditional mechanisms like loss and delay from a congestion control perspective are not as visible as the encryption controlled by the endpoints are not visible to the network devices. Not that the CWND was ever directly visible, you can still pick up indicators including retransmissions, TCP flags RTT to ACK.

There is a lot of research into QoE using TCP indicators and TCP is not going away any time soon. A lot of what TCP does in terms of congestion control has made its way directly into the Chromium implementation of QUIC. BBR (Bandwidth, Bottleneck and RTT) and CUBIC algorithms are already there. Check out the chromium source code for more details.

Fairness and efficiency concepts don’t go away, they just move up the application stack which has pros and cons. The main pro I see is rapid evolution of CC algorithms independent of the OS in use.

We can still measure performance and QoE for DTLS using packet analysis, but it becomes more opaque than with TCP. Individual streams will carry multiple requests and responses meaning that we can’t correlate a specific object request with a request and response cycle.

I see using packet rates as one of the key ways to determine whether links are congested, or where congestion may be occuring. We also need to think about measuring performance both from the point of distribution (such as a load balancer) as well as at the point of consumption (agent).

While agent bloat is a genuine concern, ZTNA often requires an agent itself in order to be able to build the tunnels required to connect users to services. If you take a look at the number of daemons/services running on a machine, you can see that there are many services running all the time, and it’s not really about the number of services (up to a point), but how much they consume resources wise.

Network performance and being able to measure it in transit to find devices that may add latency or otherwise disrupt performance is still needed, as is traditional infrastructure monitoring, synthetics, and APM (Application Performance Management).

Check out my video on A Brief History of TCP Performance for more information on measuring network performance.

Zero trust and measuring performance

Comments

Leave a Reply Cancel reply