Web Load Balancing and Caching 101 Logo

Web Load Balancing & Caching 101

What is Load Balancing?

Load balancing is the process by which inbound internet protocol (IP) traffic can be distributed across multiple servers. Load balancing enhances the performance of the servers, leads to their optimal utilization and ensures that no single server is overwhelmed. Load balancing is particularly important for busy networks, where it is difficult to predict the number of requests that will be issued to a server.

Typically, two or more web serves are employed in a load balancing scheme. In case one of the servers begins to get overloaded, the requests are forwarded to another server. Load balancing brings down the service time by allowing multiple servers to handle the requests. This service time is reduced by using a load balancer to identify which server has the appropriate availability to receive the traffic.

The process, very generally, is straightforward. A webpage request is sent to the load balancer, which forwards the request to one of the servers. That server responds back to the load balancer, which in turn sends the request on to the end user.

What is Session Persistence and Why is It Important?

An important issue when operating a load-balanced service is how to handle information that must be kept across the multiple requests in a user's session. If this information is stored locally on one backend server, then subsequent requests going to different backend servers would not be able to find it. This might be cached information that can be recomputed, in which case load-balancing a request to a different backend server just introduces a performance issue.

One solution to the session data issue is to send all requests in a user session consistently to the same backend server. This is known as persistence or stickiness. A significant downside to this technique is its lack of automatic failover: if a backend server goes down, its per-session information becomes inaccessible, and any sessions depending on it are lost. The same problem is usually relevant to central database servers; even if web servers are "stateless" and not "sticky", the central database is (see below).

Assignment to a particular server might be based on a username, client IP address, or by random assignment. Because of changes of the client's perceived address resulting from DHCP, network address translation, and web proxies this method may be unreliable. Random assignments must be remembered by the load balancer, which creates a burden on storage. If the load balancer is replaced or fails, this information may be lost, and assignments may need to be deleted after a timeout period or during periods of high load to avoid exceeding the space available for the assignment table. The random assignment method also requires that clients maintain some state, which can be a problem, for example when a web browser has disabled storage of cookies. Sophisticated load balancers use multiple persistence techniques to avoid some of the shortcomings of any one method.

Another solution is to keep the per-session data in a database. Generally this is bad for performance since it increases the load on the database: the database is best used to store information less transient than per-session data. To prevent a database from becoming a single point of failure, and to improve scalability, the database is often replicated across multiple machines, and load balancing is used to spread the query load across those replicas. Microsoft's ASP.net State Server technology is an example of a session database. All servers in a web farm store their session data on State Server and any server in the farm can retrieve the data.

Fortunately there are more efficient approaches. In the very common case where the client is a web browser, per-session data can be stored in the browser itself. One technique is to use a browser cookie, suitably time-stamped and encrypted. Another is URL rewriting. Storing session data on the client is generally the preferred solution: then the load balancer is free to pick any backend server to handle a request. However, this method of state-data handling is not really suitable for some complex business logic scenarios, where session state payload is very big or recomputing it with every request on a server is not feasible, and URL rewriting has major security issues, since the end-user can easily alter the submitted URL and thus change session streams. Encrypted client side cookies are arguably just as insecure since unless all transmission is over HTTPS, they are very easy to copy or decrypt for man-in-the-middle attacks.

Load Balancing Algorithms

A variety of scheduling algorithms are used by load balancers to determine which backend server to send a request to. Simple algorithms include random choice or round robin. More sophisticated load balancers may take into account additional factors, such as a server's reported load, recent response times, up/down status (determined by a monitoring poll of some kind), number of active connections, geographic location, capabilities, or how much traffic it has recently been assigned. High-performance systems may use multiple layers of load balancing.

Load balancing of servers by an IP sprayer can be implemented in different ways. These methods of load balancing can be set up in the load balancer based on available load balancing types. There are various algorithms used to distribute the load among the available servers.

Random Allocation

In a random allocation, the HTTP requests are assigned to any server picked randomly among the group of servers. In such a case, one of the servers may be assigned many more requests to process, while the other servers are sitting idle. However, on average, each server gets its share of the load due to the random selection.

Round-Robin Allocation

In a round-robin algorithm, the IP sprayer assigns the requests to a list of the servers on a rotating basis. The first request is allocated to a server picked randomly from the group, so that if more than one IP sprayer is involved, not all the first requests go to the same server. For the subsequent requests, the IP sprayer follows the circular order to redirect the request. Once a server is assigned a request, the server is moved to the end of the list. This keeps the servers equally assigned.

Weighted Round-Robin Allocation

Weighted Round-Robin is an advanced version of the round-robin that eliminates the deficiencies of the plain round robin algorithm. In case of a weighted round-robin, one can assign a weight to each server in the group so that if one server is capable of handling twice as much load as the other, the powerful server gets a weight of 2. In such cases, the IP sprayer will assign two requests to the powerful server for each request assigned to the weaker one.

Server health checking

Server health checking is the ability of the load balancer to run a test against the servers to determine if they are providing service.

Ping

This is the most simple method, however it is not very reliable as the server can be up whilst the web service could be down. Also ICMP pings are often blocked by firewalls.

TCP Connect

This is a more sophisticated method which can check if a service is up and running like a service on port 80 for web. i.e. try and open a telnet connection to that port on the real server.

HTTP GET Header

This will make a HTTP GET request to the web server and typically check for a header response such as 200 OK.

HTTP GET Contents (negotiate or regex)

This will make a HTTP GET and check the actual content body for a correct response. Can be useful to check a dynamic web page that returns 'OK' only if some application health checks work i.e. backend database query validates.

What is Web Caching?

A Web cache is a temporary storage place for files requested from the Internet. After an original request for data has been successfully fulfilled, and that data has been stored in the cache, further requests for those files (a Web page complete with images, for example) results in the information being returned from the cache rather than the original location.

Caching is useful for any library. Faster response to users' requests and saved bandwidth are never a bad thing. Caching really makes sense for libraries that feel they must purchase more bandwidth to keep up with increased usage. In such cases a cache server or cache appliance could very likely lower the demand on the existing bandwidth, thus make a costly bandwidth upgrade unnecessary. Suppose an upgrade from a 256K data circuit to a full T1 will increase your monthly Internet bill by $500 . A $3,000 investment in caching (and caching solutions often can be implemented for much less than that ) will start to show a return after six months.

Doesn't cached Web content get stale?

Web sites are continually updating their content. News headlines change, stock quotes change, weather changes. It may seem that caching is not worthwhile if it is returning dated material. A traffic report that is two hours old doesn't do you much good. Fortunately there are checks and balances in place to ensure that the content you are viewing is current.

So how does your cache know what to hang on to and what to let go? That depends on choices made by the Web developer as well as the way the user configures his cache. As mentioned above, Web sites are made up of individual pieces. Each one of these pieces is encoded with specific information that will tell your cache how to handle it. This information may specify, “Don't cache this item,” in which case the cache will ignore it. The item may have a “max age” specified. This tells the cache that after a set amount of time the cache must check in with the Web site for newer versions of that object. The “expires” field serves roughly the same purpose. The item might also have a “last modified” field. Last modified is another way for your cache to ask the Web server if the object has been modified since your last visit. If it has, the cache gets a new copy, if not the cache just hangs on to the copy it already has. The Web site administrator controls each of these items. There are many cache products available. Each has lots of different configuration options to help ensure that your data is current. Caches can be configured to accept all, some, or none of the priorities that the Web site administrator sets.

What is Cache Appliance/ Cache Server?

A cache appliance is a hardware and software caching solution all in one unit. A cache server is a software-only solution. The software is installed on an existing server. Unlike a browser cache that only benefits one user, cache servers or appliances are shared and benefit every user in the network. The cache server/appliance sits on the Local Area Network. A reverse proxy cache usually has a load balancing web cache for incoming requests.

Types of Web Caches

Forward/transparent proxy servers, reverse proxy servers (which are actually what the cache appliances are running internally) and web servers mostly have web caches. The caches in web servers are RAM caches as they already have the resources served locally. The caches on proxy servers could be RAM & DISK, usually both. It is hgly recommended to install a 15k RPM or SSD HDD as proxy server DISK cache.

SSL Offloading

Secure socket layer (SSL) certificates provide authentication between a server and a client computer in a Web application. Companies or businesses with a dedicated SSL certificate must host that certificate on a Web server. Heavy use of the certificate can put a strain on the machine and slow down the application.

SSL offloading takes all the processing of SSL encryption and decryption off the main Web server and moves it to a separate device designed specifically for the task. This allows the performance of the main Web server to increase and it handles the SSL certificate efficiently.

SSL offloading increases the effectiveness of the security offered by the certificates because the designated device can devote more processing time to warding off attacks. It increases the Website and application speed and prevents companies from needing to add more Web servers to keep up with the demands of a frequently used SSL certificate.

SSL termination performs decryption on the designated device, then sends the unencrypted data to the main Web server. This data passes through extra security measures such as an intrusion detection system and a firewall to protect the transmission of unencrypted data. SSL bridging decrypts and checks the data for malicious code before it reaches the server. It then re-encrypts it and processes it again after the server redirects it to the designated device. The extra step slows down the process.

HTTP Compression

There's a finite amount of bandwidth on most Internet connections, and anything administrators can do to speed up the process is worthwhile. One way to do this is via HTTP compression, a capability built into both browsers and servers that can dramatically improve site performance by reducing the amount of time required to transfer data between the server and the client. The principles are nothing new — the data is simply compressed. What is unique is that compression is done on the fly, straight from the server to the client, and often without users knowing.

HTTP compression is easy to enable and requires no client-side configuration to obtain benefits, making it a very easy way to get extra performance. This article discusses how it works, its advantages, and how to configure Apache and IIS to compress data on the fly.

Why Compress?

Most user's knowledge of compression is from compressing a group of files that they download, extract, and open. But compression can also be used passively to compress documents as they are being transferred to a client's browser. Because it's a passive process, the server can reduce the size of the pages sent, therefore reducing the download time for users and their bandwidth usage.

Working the numbers helps clarify the gains. You can typically reduce an HTML document to less than half of its original size. This, in turn, halves the amount of time the client needs to download the page as well as the amount of bandwidth required. All of this is achieved without actually changing the way the site works, its page layout, or the content. The only thing that changes is the way the information is transferred.

Unfortunately, there are limitations.

Suitable File Types

Not all files are suitable for compression. For obvious reasons, files that are already compressed, such as JPEGs, GIFs, PNGs, movies, and 'bundled content (e.g., Zip, Gzip, and bzip2 files) are not going to compress appreciably further with a simple HTTP compression filter. Therefore, you are not going to get much benefit from compressing these files or a site that relies heavily on them.

However, sites that have a lot of plain text content, including the main HTML files, XML, CSS, and RSS, may benefit from the compression. It will still depend largely on the content of the file; most standard HTML text files will compress by about a half, sometimes more. Heavily formatted pages, for example those that make heavy use of tables (and therefore repetitive formatting content) may compress even further, sometime to as little as one-third of the original size.

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to page load time increases when users do not receive compressed content. This occurs where anti-virus software interferes with connections to force them to uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy- a common configuration in corporate environments- was the mainstream browser most prone to failing back to uncompressed HTTP.

URL Translation & URL Rewrite

URL translation is translation from externally known URLs to the internal locations

URL Rewriting is a server-side technique for mapping URL requests to request handlers.

Typically there is a direct mapping between request URL and the handler for that request. All requests that end in .php will be handled by a PHP script with the given name. Similarly, request paths that end in .html will typically be handled by a static file handler. The mapping between URL and handler is typically static, and depends solely on the "extension" of the URL Request.

URL Rewriting allows administrators to more flexibly map between the incoming requests and the actual resource that handles the request on the server. For example, using URL Rewriting, requests that have a .html extension could be served by ASP.NET, or requests that have no extension could be served by a PHP script.

Most URL Rewriters match the incoming URL against a set of patterns, and rewrite the URL according to which patterns match. The language used in the most powerful and flexible rewriters to describe the patterns is known as Regular Expressions. Some rewriters also allow rewriting based on other factors, including the request headers, Server variables, and even the state of the server filesystem.

Many URL Rewriters can also redirect requests. This has led to confusion in the terms Rewrite and Redirect. To learn the difference, see Redirecting versus Rewriting. IIRF can perform URL redirects as well as URL rewrites.

Why Rewrite URLs?

There are many of reasons to rewrite URLs:
  • Search Engine Optimization (SEO)
    SEO is a broad topic, but the main goal is to assist search engines in finding content on a web site. One aspect of that is optimizing the URLs themselves.

  • Making user-friendly URLs
    Similar in effect to SEO, this allows the use of friendly public URLs where they are observed by users in links and browser bars. Elements within URLs that are meaningful only to server-side technology, including the extension of the server-side script or web app platform, can be obscured from the public.

  • Faking people out
    In some cases the web administrator would like to conceal the server-side technology that is being used. URL Rewriting allows, for example, a public URL that ends in .jsp to be handled by a .php script.

  • Routing requests
    You can force certain requests to use a secure connection (https), or a particular server.

  • Server-side technology migrations
    When migrating from one technology to another in stages, URL rewriting can be used to keep the URL space stable while things change on the server back-end. URL Rewriting can also be used to support migration of "old" or stale URLs to the new URL namespace, when those changes occur.

  • Injecting custom processing
    In some cases, a server administrator may wish to inject new, additional, server-side processing for well-known existing URLs. One example here is inserting special image handling logic behind a .jpg URL. You may wish to block access to image URLs from outside referrers, to limit bandwidth leaching.

  • Filtering URL requests
    An administrator may want to restrict access to certain URLs based on the referer, the requesting IP address, and so on.

References

ServeTrue IQProxy -- Reverse, forward and transparent proxy server for Windows.