End User Experience is an area of constant pain for for IT leaders running complex networks made up of a combination of Enterprise Software including Fat clients, SaaS, and internally hosted web applications. I’ve often been engaged by LOB (Line Of Business) leaders to work with IT departments to identify the root cause of End User Experience problems as these problems have often been there for months or years with no resolution.
Observe Ability provides performance consulting and advisory. Please send me a message or reach out to me on Linkedinif you need consulting on your Observability journey.
How do problems end up lasting months or years? The simple answer is that there is no single cause, but an array of issues that need to be addresses including:
- End User Experience measured by the size of the Service Desk queue.
- Battle fatigue of end users.
- Subjective and hyperbolic details in Service Desk reports.
- Inability to measure performance at the point of consumption.
The service desk queue can be the canary in the coal mine, but is not an accurate indicator of end user performance. End users are usually unable to accurately articulate the nature of the problem, which is both frustrating for users and resolving teams. Users with long running problems will eventually stop calling the service desk, except during peak frustration and the resulting in hyperbole such as “It’s always slow and everyone is impacted!”.
There’s nothing quite like being on the receiving end of a call where the other person is frustrated, upset, and unable to do their work effectively. Now imagine being that end user who is trying to perform their duties and now has to work late or skip lunch because of an ongoing problem that is difficult for them to describe.
Some inference of end user experience can be achieved using technology such ITIM (IT Infrastructure Management), APM (Application Performance Management), Synthetics, NPM (Network Performance Management). These technologies assumes you own the point of distribution, which with SaaS applications you often do not control the point of distribution of the application.
Enter The Pandemic
With the digital disruptor of the pandemic triggering a mass exodus of the office (a tightly controlled environment) technology leaders had to support end users moving to work from home. I supported many customers whose VPN concentrators melted under the stress of moving an entire workforce remote, a problem answered with split tunnelling for common apps such as Zoom, MS Teams, and Webex. The unexpected result for end users was that using their consumer and contended internet connection often yielded better results, at least subjectively.
Technology leaders were faced with two problems:
- How do we give users the same or better experience in the office as they have at home?
- How do we quantify the performance of end user experience for internally and externally hosted applications?
This requires a pivot to observability at the point of consumption to accurately quantify the click to render performance. End user agents have been around forever and are both loved and hated by users and technologists alike, and javascript End User Experience beacons only really work for services you can control.
End User Experience Telemetry
When looking at an End User Experience Observability solution there are a few telemetry that I care about most:
- Device environmentals: CPU, Memory, Disk IO
- Network utilisation and Wired vs Wireless (signal strength)
- Endpoint location
- Services being used
- Activities and timestamps
- Device Events
There is plenty of other telemetry points that are important, but these are my goto points to accurately quantify performance. In this example I’m going to use Aternity which is Riverbed product.
Disclaimer: This is not a sponsored post, however I am a former Riverbed employee.
Environmentals
Environmentals include CPU utilisation, memory utilisation, Disk IO. This important to understand the saturation of the device in play. It’s very easy to conflate application performance and device performance in the eye of the user. Having both real-time and historical data allows a resolving team to understand the state of the device when an issue occurred.
Network Utilisation and Wired vs Wireless
Network connectivity is an important of nearly every modern application. Even MS Office uses network connectivity all the time for Licensing, Grammar, Visualisations, and Theme recommendations. If the connectivity is poor we need to understand the quality of the connection at the time of a performance incident. NPM metrics like window scaling, protocol, and retransmissions are also important for modern EUE agents.
Endpoint Location
Endpoint location tells us where the endpoint is located giving us valuable information such as if the user is on VPN, in the office, or at a pool on Sentosa island in Singapore.
Services Being Used
What applications and services are being used by the endpoint? This includes data such as servers and resource analysis.
Activities and Timestamps
Being able to quantify the performance of each activity whether it be a fat client, web application, or Unified Communications is important to understand when problems occur.
Device Events
Device events tell us about the health of an endpoint. For example battery wear alarms are often accompanied by overheating and diminished CPU performance. Being able to cite when a problem occurred also provides users with a sense of relief that their pain is being heard and build rapport “I saw that outlook crashed at…”.
If you made it this far, thanks for reading and have an excellent day. Feel free to purchase a book from the book recommendations to support this blog.
Leave a Reply