How many times have you heard this?…. Citrix is slow.
It has to be one of the biggest complaints I hear about Citrix, slow logons, sluggish sessions with typing delays, poor video and sound. This session is all about performance considerations when using Citrix along with some best practices to prevent steady state related performance bottlenecks.
I have a design methodology that simply states that performance can be guaranteed with good design and to me this is simple, never share and never over allocate backend hardware resources and providing you do your math right in your capacity planning and you stick to the best practices covered in this paper you can almost guarantee a fast desktop that in many cases one that will outperform a local PC. Try it…
What are the cause’s slowness?
There are so many things that can cause a Citrix session to slow down but the most obvious are covered here:
- Lack of capacity planning
- Using shared resources
- Session Sharing
- Hypervisor Choices
- Not having any spare capacity
- Active Directory Domain considerations
- Network Bottlenecks
- File Server locations
- Profile Management
- Too many startup processes
- Poor Anti-Virus settings
- Badly set HDX policies
- TCP Offload (under certain circumstances)
Let’s look at each item in isolation
There is no black art when it comes to Citrix Capacity Planning, it is in fact very simple math. Of course it does depend on what you are doing with Citrix as the numbers will vary greatly but I am talking about the masses here. I won’t be covering GPU or High end power users, those I will cover in a later paper.
First a few golden rules:
- Do not place end user sessions or desktops on equipment that is shared with other systems where you cannot ring-fence the required resources, examples being a shared SAN on VMWare*.
- Never use memory or CPU ballooning
- Don’t bother splitting up the hard drives unless they can be spread across different dedicated LUNS, ideally you won’t be using a SAN for the Citrix Desktop (Covered later in this paper)
- Make sure the File Servers, Print Servers and user profiles are in the same subnet as the Citrix servers, especially when using profile redirection settings
- Reduce the logon time to be as fast as can be because the servers is under its greatest load during logon\logoff
* Exceptions considered here are if you using products like Atlantis ILIO or Nimble where disk IOPS are measured in the hundreds of thousands.
Here are the magic numbers that you need to remember:
If you can allocate theses resources to your Citrix sessions I will guarantee it will pretty fast.
SBC\XenApp Sessions (Desktop and App)
- 2GB RAM per session server and 200MB for each session
- No more than 14 users per CPU Core with HT considered
- 5 Disk IOPS per session
- 100 IOPS during logon
- 100 IOPS during Logoff
VDI\XenDesktop Desktop (Win 7 or above)
- 2GB RAM per
- No more than 4 users per CPU Core with HT considered
- 25 Disk IOPS per session
- 200 IOPS during logon
- 200 IOPS during Logoff
What does this all mean?
In order to make sense of this we need to translate this down to physical hardware and for the sake of simplicity here I am going to take an average run of the mill mid-range HP Server, say a DL380 Gen 9 with 2 x ten core 2Ghz processors, 128GB RAM with an additional 4 x 1GB NIC along with 8 x 15K 300Gb SAS drives and 2GB Cache RAID Controllers at the cost of around £6,500 What does this give me?
- 128GB RAM (Obvious)
- 40 CPU (with Hyper Threading)
- 1440 Disk IOPS in a RAID 1+0 setup
- 1TB of usable local storage space
- 16GB of network traffic at Full Duplex
What this all comes down to is how many desktops\sessions can I run per physical servers?
Let’s work it out:
|Hardware Resources Available|
|Memory||CPU Cores||Disk IOPS|
|Hardware Resources Required for per SBC session|
|Memory||CPU Cores||Disk IOPS|
|Theoretical Maximum Sessions per server per metric|
|Hardware Resources Required for per SBC session|
|Memory||CPU Cores||Disk IOPS|
As you can see from the chart above Disk IOPS are our biggest bottleneck, not CPU and Memory.
So If you’re running a XenApp\SBC\RDSH solution we have found that the optimum hardware configuration for each VM is 4 vCPU’s, 12GB RAM, 100GB HDD with a single NIC that is part of multi-NIC bonded network of four NICs. On this VM you you will comfortably get 35 users sessions running concurrently and on each physical server of the specification above will run up of eight of these VM’s on the same server giving you a total of 280 sessions per HP Servers but for peace of mind size it up with 200 user session in mind, knowing there is spare capacity with will be important later. User logons are critical metric also to consider and these servers are capable of handling 14 simultaneous logons at 100 IOPS with a 1440 IOPS capacity
Making the Hardware Cost Per XenApp Desktop = £32.50 per user
Let compare this with VDI\XenDesktop\VMWare View the numbers will be different because we are publishing many more operating System but this time we will be running Window 7\8 each with 2GB RAM, 1 vCPU and 25 IOPS and as we can see from the chart above you theoretically get 58 desktops running off one server but then apply our 1 third contingency we are looking realistically to achieve around 40 Desktops per physical server
Making the Hardware Cost Per VDI Desktop = £90 per user (quite a difference)
The choice to select what desktop to run VDI vs SBC will be covered in a separate paper.
So there you have it, for SBC\XenApp Desktops you can each server will cope with 200 desktops and for VDI that number comes down to 40.
Using shared resources
In order to guarantee a fast desktop you need to be able to be in control of your back end resources. So many times I have seen customers who have invested heavily on great equipment like NetApp FAS units or Dell EqualLogic believing all of their storage related issues will be resolved. The problem with these types of shared resources are what they are, that is they are shared. If you are running the rest of your systems off this shared storage like SQL, Exchange or file sharing then you cannot guarantee the performance.
Look instead to utilize local storage for the desktop provision but keep the critical core services like the controllers on the shared SAN as these generally do not affect the performance.
Choosing the right Hypervisor to run Citrix on is very important. There are two levels of virtualisations that we need to think about here. One being the server operating system itself and the second being the virtual desktop, so we are running a virtual desktop on a virtual server and some things can be lost in translation, but about that later. The choice to virtualise Citrix is an old debate today as hardware performance has massively increased over the last few years it makes so much sense to virtualise Citrix today, though there was a time in my life I was dead against it.
There are only really three options when it comes to choosing the virtualization platform to run Citrix on, VMWare, Hyper-V and XenServer. Each of them have their merits, but as a rule I always recommend that the platform of choice for virtual desktops should be Citrix XenServer over VMWare and Hyper-V and the reason being is simple. VMWare and Hyper-V lend themselves very well for over allocation and both of them do that very well achieving excellent consolidation ratios oh and they cost money… Citrix XenServer utilises the Hyper Visor built in to modern CPU’s using a technology called Para-Virtualisation instead of the billions of line of code used with VMWare and Hyper-V which almost bare metal performance and it is free with XenDesktop. There is more to be said about this subject but this will be covered in more details in a separate paper at a later stage. Just leave comments if you have any questions.
When publishing applications with XenApp enable Session Sharing this means that every user will only use one session which will reduce the load on the back end and improve the performance for the users as they don’t need to open up multiple sessions.
Make sure all applications are available across all servers using App-V where necessary and publish every application with the same sessions settings, i.e. colour depth and sound settings that way the user will always launch applications on their current session and not be logged on to multiple servers and they won’t need to go through the whole logon process every time they launch an application.
Where possible, don’t run your equipment to it maximum capacity, try to give yourself a third capacity as contingency on standby to cope with system failures and planned maintenance work. An example being if you are servicing a 900 end user virtual desktop estate build the backend solution to cope with up to 1,200 users (i.e. six session hypervisor servers instead five)
This will make the ongoing maintenance of the estate much easier and provide swing equipment when doing upgrades and maintenance tasks.
Active Directory Domain considerations
I do hate to sometimes mention the obvious but I have seen this issue too many times not mention it here. Make sure Active Directory Sites and Services is set up right with the right subnets defined and that the subnet that your Citrix servers are in are assigned to the right subnet to ensure authentication is local and not left to a random choice or worse still over a WAN connection as this will slow down logons.
The amount of bandwidth required for end user Citrix users is very low but on the backend the servers will require ample bandwidth and here a few guidelines I would advise you to stick to:
- Use bonded networks for general network traffic, ideally four NICS per server setup with Active\Active LACP
- Separate the Provisioning Services on a separate VLAN again in bonded LACP Channels
- Ensure the Switch backplane has a minimum of 32GB\Sec Throughput
- Use Layer 3 switches and use these switches as the Gateway
- Use Citrix Netscalers for internal load balancing
File Server Considerations
There a few things to consider with your file servers with Citrix that could affect performance:
- Make sure that the file servers that are being used for Citrix Profiles and redirection are in the same subnet as the Citrix Session servers to ensure they don’t need to traverse a gateway.
- Consider the amount of storage you need and the IOPS per session especially when re-directing
- Use monitoring to keep an eye on disk queue lengths and set up alerting if the queue lengths go above 1.5 as that will significantly affect performance
- When using DFS make sure the DFS servers subnets are defined in Active Directory Sites and Services
Profile management is a really big subject which will be covered in a later paper. The speed of logon will dramatically effect Citrix performance due to the excessive load that logging on places on system. Here are some best practices:
- Use a Mandatory profile for all users stripped off all unnecessary junk which can be reduced in size to around 256K. This makes for an almost instant logon
- Hardcode in redirection directly in the Mandatory profile
- Use drive letters for redirection do not use UNC paths to eliminate SMB limitations
- Make sure profiles are cached locally on the Citrix servers and deleted at log off.
Too many startup processes
This is just a basic rule that gets overlooked too often. Eliminate all unnecessary startup processes run when a user logs on:
- Delete everything in the All Users\Startup Folder
- Delete all entries in the registry key HKLM\Software\Microsoft\Windows\CurrentVersion\Run
Poor Anti-Virus settings
Getting the Anti Virus settings wrong can severely affect the speed of the Citrix servers. Follow these rules to give a AV a fighting chance without killing performance:
- On-Access Scan set Only on writes
- Sensitivity set Heuristic
- Scheduled Full Scans on Session Hosts
- I highly Recommended these exclusions are set:
- \Program Files\Citrix\Group Policy\Client-Side Extension\CitrixCseEngine.exe
- \Program Files (x86)\Citrix\System32\wfshell.exe
- \Program Files (x86)\Citrix\system32\CpSvc.exe
- \Program Files (x86)\Citrix\System32\CtxSvcHost.exe
- \Program Files (x86)\Citrix\system32\mfcom.exe
- \Program Files (x86)\Citrix\System32\Citrix\Ima\ImaSrv.exe
- \Program Files (x86)\Citrix\System32\Citrix\Ima\IMAAdvanceSrv.exe
- \Program Files (x86)\Citrix\HealthMon\HCAService.exe
- \Program Files (x86)\Citrix\Streaming Client\RadeSvc.exe
- \Program Files (x86)\Citrix\Streaming Client\RadeHlprSvc.exe
- \Program Files (x86)\Citrix\XTE\bin\XTE.exe
- \Program Files\Citrix\Independent Management Architecture\RadeOffline.mdb
- %AppData%\ICAClient\Cache (if using pass-through authentication)
Citrix High Definition Experience policies can make a large number Improvements that can really improve the virtual desktop experience from redirecting video and Flash to be rendered on the local device for desktop like performance to USB and Microphone optimizations as well as providing additional tools for Microsoft Lync. There is no better virtual desktop platform than Citrix if you are looking to use Lync. Here are my recommended general guidelines:
- Configure HDX MediaStream Flash Redirection – HDX MediaStream Flash Redirection allows you to move the processing of most Adobe Flash content from Internet Explorer on the server to LAN- and WAN-connected users’ Windows and Linux devices.
- Configure Audio – You configure audio through the Policies node of Citrix Studio and You control the followings settings for the audio features through the Citrix User Policy settings:
- Audio Plug-n-Play (XenApp only)
- Audio quality
- Client audio redirection
- Client microphone redirection
- Audio redirection bandwidth limit
- Audio redirection bandwidth limit percent
- Audio over UDP Real-timeTransport (XenDesktop only)
- Audio UDP Port Range (XenDesktop only)
- Configure Video Conferencing with HDX RealTime Webcam Video Compression
- Configure HDX RealTime to provide your users with a complete desktop multimedia conferencing feature.
- Configure HDX 3D – HDX 3D allows graphics-heavy applications running on XenApp to render on the server’s graphics processing unit (GPU). By moving DirectX, Direct3D and Windows Presentation Foundation (WPF) rendering to the server’s GPU
- Enable XenApp 6.5 OpenGL GPU Sharing Feature Add-on – This feature add-on to XenApp 6.5 enables graphics processing unit (GPU) hardware rendering of OpenGL applications in Remote Desktop sessions
- Assigning Priorities to Network Traffic – With XenApp and XenDesktop, priorities are assigned to network traffic across multiple connections for a session with quality of service (QoS)-supported routers.
- Add Dynamic Windows Preview Support – With the Dynamic Windows Preview feature enabled, the following Windows Aero preview options are available to XenApp users with published applications:
- Taskbar Preview – In a single-monitor configuration, when the cursor hovers over a window’s taskbar icon, an image of that window appears above the taskbar
- Windows Peek – When the cursor hovers over a taskbar preview image, a full-sized image of the window appears on the screen
- Flip – When the user presses ALT+TAB, small preview icons are shown for each open window.
- Flip 3D – When the user presses TAB+Windows logo key, large images of the open windows cascade across the screen.
- Configuring Read-Only Access to Mapped Client Drives – With the Citrix User Policy setting Read-only client drive access, you can control whether users can copy files from their virtual environments to their user devices.
You may not have heard of this one, but it has caused so many problems I am going to mention it here and that is problems with TOES cards.
What is a TOE Card?
A TOE card is a network adapter that has a built in TCP Offload Engine (hence the name TOE) and pretty much every server today will have TOE cards. Now these are great in principle as they can really improve network performance by taking some of the load from the operating system but they can sometimes cause issues on virtual machine, let me give you an example.
I had a problem at a major global bank I was working for rolling out XenApp, all went fine until we got to Tokyo. I could not get the systems in Tokyo to ping the Citrix servers hosted in London but they could ping any other servers in London, just the Citrix servers. My immediate reaction was it had to be a firewall right? After a bit of too and froing I had to acknowledge it was a Citrix issue so my trusty tool called Wireshark was brought in to action and what it showed me was that packets were being re-transmitted thousands of time and eventually dropped. What was going on? After a bit of digging I discovered that this bank had reduced their WAN MTU for the GRE tunnel they were running to Tokyo which is a perfectly normal and recommended thing to do for GRE however because the TCP Offload was trying to negotiate the MTU size which cause thousands of re-transmits the packets never made it to the Citrix session Servers. This is an example how things can get lost in translation from physical to virtual. I was running these Citrix Servers on a VMWare cluster and once the VMWare tools were installed they will detect the TOE card and attempt to use it as it did on this occasion so we had a virtual NIC and a Physical NIC both attempting to use the TCP offload engine which failed. Disabling the TCP offload on the VM instantly fixed the issue. I have also seen slow network issues and again disabling the offload engine has rectified the problem.
I hope you find this guide useful, it is not exhaustive but it will cover most of your Citrix related performance problems and you will find that they will come down to one two things, [1 stupidity] that being where the servers are not sized up properly or the file server that everyone uses is on a 10MB WAN link 100 miles away or user profiles have bloated to over 100MB etc..  Stuff gets lost in translation from Physical to virtual. The conversion from traditional PC’s to virtual desktops introduces new IT challenges that unless you are experienced in you not have considered like profiles and printing.
Final thought, remember there are the P’s that you need to get right in Citrix, Performance, Printing and Profiles. Get these right and you will have a happy user base.