Guide to Application Acceleration and Traffic Optimization
Making Sense of a Crowded Technology MarketBy David Newman, Network World Lab Alliance, 10/1/07
Confused about application acceleration? You've got company.
Dozens of vendors have entered this hot area, using another dozen or so techniques to reduce response time, cut bandwidth consumption, or both. As with any market where multiple sellers all speak at once, it's easy to get lost amid the claims and counterclaims. It's harder still when the wares for sale are new and unfamiliar to many buyers.
As always, education is key. This article describes the major types of acceleration devices; introduces the players; explains the workings of acceleration mechanisms; and looks into what the future holds for this technology.
Application acceleration products generally fall into one of two groups: Data center devices and symmetrical appliances that sit on either end of a WAN link. A third category, acceleration client software, is emerging, but it is in relatively early stages.
Application acceleration may be a relatively new market niche, but the technology behind it has been around for some time. For close to a decade, companies such as Allot Communications and Packeteer have sold bandwidth optimization appliances that prioritize key applications and optimize TCP performance (Packeteer also offers a symmetrical WAN device.) Other acceleration technologies such as caches, compression devices, and server load balancers have been around even longer. For the most part, though, the application acceleration market today is split between data-center and WAN-based devices.
The two device types differ not just in their location in the network but also in the problems they address and the mechanisms they use to solve these problems.
Data centers have high-speed pipes and numerous servers. Some also have multi-tiered designs, with Web servers arrayed in front of application and database servers. In this context, improving performance means reducing WAN bandwidth usage for out going and incoming traffic and offloading TCP overhead and/or SSL overhead or eliminating servers.
Prominent vendors of data-center acceleration devices include Array Networks, Cisco Systems, Citrix Systems, Coyote Point Systems, Crescendo Networks, F5 Networks, Foundry Networks, and Juniper Networks.
Data-center acceleration devices use a variety of mechanisms to achieve these ends Weapons in their acceleration arsenal include TCP connection multiplexing, HTTP compression, caching, content load balancing, and SSL offload. Though more of a security measure than a performance feature, some data-center accelerators also rewrite content on the fly.
Of these mechanisms, connection multiplexing and HTTP compression do the most to reduce WAN bandwidth usage. Connection multiplexing is helpful when server farms field requests from large numbers of users. Even with load balancers in place, TCP connection overhead can be very significant. Acceleration devices lighten the load by multiplexing a large number of client-side connections onto a much smaller number of server-side connections. Previous test results show reductions of 50:1 or higher are possible.
Note that 50:1 multiplexing doesn't translate into a 50-fold reduction in servers. Other factors such as server CPU and memory utilization come into play. Still, multiplexing can lower overhead and speed content delivery.
As its name suggests, HTTP compression puts the squeeze on Web payloads. Most Web browsers can decompress content; usually the stumbling block is on the server side, where compression is often disabled to reduce delay and save CPU cycles. By offloading this function off the servers and onto the acceleration devices make it feasible to do compression.
Obviously, results vary depending on the compressibility of content. Since most sites serve up a mix of compressible text and uncompressible images, HTTP compression offers at least some bandwidth reduction, and may even be able to reduce the number of Web servers needed. One caveat: Compression won't help at all with seemingly random data streams, such as encrypted SSL traffic, and could even hurt performance.
The remaining data-center application acceleration mechanisms help lighten the load on servers. Caching is one of the oldest tricks in the book. The acceleration device acts as a "reverse proxy," caching oft-requested objects and eliminating the need to retrieve them from origin servers every time. Caching can deliver very real performance gains, but use it with care: Real-time content such as stock quotes must never be cached. Object caching also won't help when a small part of a large object changes, for example when a single byte in a large document is deleted.
Content load-balancing is conceptually similar to previous generations of layer-4 load balancing, but in this case the decision about where to send each request is based on layer-7 criteria. For example, devices run SQL queries and other "health checks" on back-end databases to decide which server will provide the lowest response time.
SSL offload also helps speed delivery of secure communications. In some cases, acceleration devices act as SSL proxies; the encrypted tunnel ends on the acceleration appliance, with cleartext traffic flowing between it and the origin servers. This frees up the server from computationally expensive SSL encryption, and in many cases it can dramatically reduce server count in the data center. It's also possible to achieve end-to-end encryption through proxying; the acceleration device terminates a client's SSL session and then begins a new session with the server. Some performance gain is still possible through TCP multiplexing.
Because data-center acceleration devices are application-aware, they have the added capability of being able to rewrite URLs or even traffic contents on the fly. Citrix recently announced the ability to replace credit card numbers in data streams with Xs instead, preventing theft by interception. Similarly, it's possible to rewrite URLs, either to make them shorter or more recognizable or to hide possible security vulnerabilities. On this latter point, an attacker may be less likely to probe for Microsoft Active Server Page vulnerabilities if a URL ending in ".asp" gets rewritten to end with ".html".
For many enterprises, the biggest bang for the acceleration buck comes not in the data center, but on the dozens or hundreds of WAN circuits linking remote sites to data centers. A recent Nemertes Research survey found that monthly WAN fees alone account, on average, for 31 percent of total enterprise IT spending. In that context, even a small performance improvement can mean big savings.
That's not to suggest that symmetrical WAN devices provide small improvements. Recent Network World test results show WAN bandwidth reduction of up to 80 times (not 80 percent) and 20- to 40-times improvements in file-transfer rates. Considering the huge bite of the IT budget that WAN circuits take every month, symmetrical WAN acceleration devices are very much worth considering.
The technology certainly has gotten vendors' attention, with numerous companies offering this type of acceleration device. Players in this crowded field include Blue Coat Systems, Cisco Systems, Citrix Systems, Exinda Networks, Juniper Networks, Riverbed Technology, Silver Peak Systems, and Streamcore.
All these vendors offer appliances and/or acceleration modules large and small, with size depending on WAN link capacity and the number of connected sites and users. Devices generally include disks for caching (though caching may have a different meaning than the caching capability of data-center devices; more on that later). All seek to address the number one bottleneck in enterprise WAN traffic: the sluggish performance of the Microsoft Windows TCP/IP stack across the WAN.
Beyond those common capabilities, these devices may offer at least some of the following mechanisms to reduce WAN bandwidth usage or to speed data transfer: application- and transport-layer optimizations; pre-positioning (a method of storing content closer to users); data compression; read-ahead/write-behind methods; and protocol prioritization.
Application-layer awareness is the most potent WAN acceleration technique. All vendors in this area can optimize the two most common application-layer protocols in enterprise networks – CIFS (common Internet file system), used in Windows file transfers, and MAPI (messaging application program interface), used by Exchange email servers and Outlook clients.
Because CIFS is notoriously chatty, it's a terrible performer in the WAN. Even a simple operation like opening a directory and listing files can involve the transfer of hundreds or even thousands of CIFS messages, each one adding delay. Acceleration devices streamline and reduce CIFS chatter using a variety of proprietary techniques. The results are impressive: CIFS performance in Network World tests of four offerings in this space was 30 to 40 times faster than a baseline test without acceleration.
All vendors can optimize CIFS, MAPI, and other popular applications such as HTTP, but there's a considerable amount of specsmanship about how many applications are supported beyond the basics. Some vendors' data sheets claim to optimize more than 100 different applications, but often this means simply classifying traffic by TCP or UDP port number, and not necessarily doing anything specific with application-layer headers or payloads. Network managers are well advised to quiz prospective vendors on what specific optimizations acceleration devices offer for their organization's particular application mix.
Pre-positioning, another big bandwidth saver, is essentially an automated form of caching. Say a large electronics distributor regularly distributes a 75-Mbyte parts catalog to all 15,000 of its employees. Rather than have employees retrieve the catalog from headquarters over and over again, a better option is to load the presentation locally at each remote site's acceleration device, and then distribute it locally. Most caches can do that, but pre-positioning goes further by automating the catalog's distribution to all acceleration devices at remote sites. Especially for organizations with many large sites, the bandwidth savings can be very substantial.
Caching can take two forms: object caching, as discussed previously in our data center discussion and byte caching (called "network memory" by some vendors). With byte caching, each appliance inspects and caches the stream of data going by, and creates an index for each block of data it sees. The index may contain some form of hash uniquely identifying that block. The first time a device forwards data, the byte cache will be empty. On each successive transfer, the pair of devices won't transfer the data again; instead, it just sends the indexes, in effect saying "just send block X that you already have stored in your cache."
Byte caching has two benefits. First, like object caching, it greatly reduces the amount of data traversing the WAN. Second, unlike object caching, it chops the byte stream into relatively small blocks rather than dealing with potentially huge objects. If only a small part of a very large file changes, the acceleration device just sends the updated data, not the whole object. Some devices, such as those from Blue Coat and Cisco, employ both forms of caching (in Cisco's case, for Windows file traffic only. Others such as those from Riverbed and Silver Peak rely on byte caching alone.
WAN acceleration devices also use data compression to reduce WAN bandwidth usage. This isn't just the HTTP compression seen in data-center devices; instead, symmetrical WAN devices compress entire payloads of all packets, regardless of application. Compression works best for data streams comprised mainly of text or other repetitive data; for near-random byte patterns (such as images or encrypted data), it's not much help.
Cisco's WAAS acceleration devices use "read-ahead/write-behind" techniques to speed up file transfers. While these techniques aren't new (server and PC designers have employed them for years), they can speed file transfers. Both techniques take advantage of the fact that enterprise data tends to be repetitive. Over time, devices can predict that if a user requests block A of data, then a request for blocks B and C are likely to follow. With that knowledge, the device can line up the next blocks and serve them out of memory instead of a much slower retrieval from disk. And speaking of disk operations, it takes a relatively long time to write data to a disk. Write-behind operation defers write requests until several have accumulated and then does them all at once. From the user's perspective, read-ahead and write-behind both translate into faster response times.
Many acceleration devices (with the notable exception of Cisco's) also use various QoS mechanisms to prioritize key applications or flows during periods of congestion. Cisco also has a prioritization story, but it involves communication with routers, which then perform the actual queuing. For enterprises that already have enabled QoS features on their routers, this is a useful approach; for others just getting started with QoS it may make sense to consider using the acceleration device for queuing. As with application support, there is considerable variation among products as to which types of traffic acceleration devices can prioritize.
Client software, security, and device consolidation are likely to be the next major trends in application acceleration. Acceleration client software already is available from Blue Coat and others have clients in development. These software packages give PC-toting road warriors and telecommuters some if not all the techniques used in acceleration appliances.
Security is another hot-button issue, with acceleration vendors adding support for SSL optimization (supported by Blue Coat and Riverbed in the recent Network World test, with plans announced by Cisco and Silver Peak). Cisco and Silver Peak devices also encrypt all user data stored on appliances, a key consideration for regulatory compliance in some industries.
If past history is any guide, it's also likely switch and router vendors will fold at least some acceleration features into their devices. However, the market for standalone devices is highly unlikely to disappear anytime soon. Most switches and routers aren't TCP-aware today, let alone application-aware, and getting there will take time. Moreover, the form factors and component costs of acceleration devices (many have beefy CPUs, memory, and disks) argue against rapid consolidation, especially into low-end branch office switches and routers. For fully featured acceleration, standalone pairs of devices are likely to be the platform of choice for at least a few more years.