Tuesday, July 21, 2015

Load Balancing with Apache: The Mighty mod_proxy_balancer

As your super-hot website becomes crowded with more visitors each day, you may want to consider installing multiple web servers to serve it more efficiently and with lesser latency. In such cases, distributing the HTTP traffic among the servers becomes a crucial task. Fortunately, the de facto Apache HTTP server inherently supports load balancing both at plain simple and extremely complex levels. Even if you don't have a super-hot website, you may want to set up a load balancer on your local server for statistical purposes, e.g. measuring the pending requests (requests-in-flight) counts during a load test.

Apache's load balancing goes hand in hand with virtual hosts and proxies. A virtual host (vhost) allows Apache to simulate a virtual domain inside the web server's context. For example, you can set up multiple vhosts to host several websites on your system; this is commonly used by hosting providers to host websites from multiple users on a limited number of systems, as it provides an easy way of practically isolating the sites from one another. In our context, a proxy mediates by routing traffic to a destination different from its initial target, which in our case would be one worker from a pool of load balancing servers (balancer members).

Reminder: If you plan to follow this guide incrementally, don't forget to restart Apache after each config change or a2enmod command!

You need to enable a few modules to get proxies working under Apache: mod_proxy (for proxying) and mod_proxy_http (for proxying HTTP traffic). The required command a2enmod should be run with root privileges, so if you are not root but are in the sudoers group, you'll have to prepend it with sudo.

Syntax:
a2enmod <space-separated list of modules to enable, without the mod_ prefix>

e.g. for our case:
a2enmod proxy proxy_http

To configure a vhost, you have to add a <VirtualHost> entry to Apache's config. This usually goes into a separate proxy.conf file under the mods-enabled subdirectory in the Apache config directory, e.g. /etc/apache2 or /etc/httpd). This file usually appears after you enable mod_proxy as described earlier.

<IfModule mod_proxy.c>
	Listen 8080
	<VirtualHost 127.0.0.1:8080>
		ErrorLog /var/log/apache2/proxy.log
		DocumentRoot /var/www/html/my-subdomain
	</VirtualHost>
</IfModule>

Above example simply instructs Apache to listen for connections to port 8080 on localhost (127.0.0.1), and serve those requests from content found under the directory /var/www/html/my-subdomain, logging any errors to /var/log/apache2/proxy.log.

For adding a load balancer, we remove the DocumentRoot directive and define a proxy under the vhost:

		<Proxy balancer://myproxy>
			BalancerMember http://127.0.0.1:8081
			BalancerMember http://127.0.0.1:8082
		</Proxy>

and route traffic for the desired domain to that proxy:

		ProxyPass / balancer://myproxy/ lbmethod=bybusyness

This balances requests arriving at 127.0.0.1:8080 (directed at the root, /) into two proxy endpoints (BalancerMembers) at 127.0.0.1:8081 and 127.0.0.1:8082, based on their current busyness levels (queued requests counts).

Don't forget to enable the mod_proxy_balancer and mod_lbmethod_bybusyness modules. The first provides the actual load balancing feature while the other enables the by-busyness traffic routing policy. Other policies like byrequests and bytraffic are also supported, and you'll have to edit the ProxyPass directive and enable the relevant modules accordingly in order to use them.

For viewing load balancing statistics, we may also define a proxy-balancer endpoint (which will be provisioned by mod_proxy_balancer) under the vhost:

		<Location /balancer-manager>
			SetHandler balancer-manager
			Order deny,allow
			Allow from all
		</Location>

and add another ProxyPass directive for making it accessible, also under the vhost:

		ProxyPass /balancer-manager !
Visiting http://127.0.0.1:8080/proxy-balancer would then display overall and per-proxy statistics, such as the number of requests served and pending, and errors.

The full config now looks like this:

<IfModule mod_proxy.c>
	Listen 8080
	<VirtualHost 127.0.0.1:8080>
		ErrorLog /var/log/apache2/proxy.log

		<Proxy balancer://myproxy>
			BalancerMember http://127.0.0.1:8081
			BalancerMember http://127.0.0.1:8082
		</Proxy>

		<Location /balancer-manager>
			SetHandler balancer-manager
			Order deny,allow
			Allow from all
		</Location>

		ProxyPass /balancer-manager !
		ProxyPass / balancer://myproxy/ lbmethod=byrequests
	</VirtualHost>
</IfModule>

Unfortunately, the current set-up does not handle sessions properly; if your site tracks user data with sessions (e.g. $_SESSION in PHP), the session data will intermittently become unavailable as the user's requests alternate among proxy endpoints (balancer members), as the session data would be available only on the first member that served a request of that user.

One solution is to add (under <VirtualHost>) a ROUTEID cookie to the request header:

		Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED

and make new requests to 'stick to' the initially chosen endpoint via a ProxySet directive (under <Proxy>):

			ProxySet stickysession=ROUTEID

making the complete config look like this:

<IfModule mod_proxy.c>
	Listen 8080
	<VirtualHost 127.0.0.1:8080>
		ErrorLog /var/log/apache2/proxy.log

		Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
		<Proxy balancer://myproxy>
			BalancerMember http://127.0.0.1:8081
			BalancerMember http://127.0.0.1:8082
			ProxySet stickysession=ROUTEID
		</Proxy>

		<Location /balancer-manager>
			SetHandler balancer-manager
			Order deny,allow
			Allow from all
		</Location>

		ProxyPass /balancer-manager !
		ProxyPass / balancer://myproxy/ lbmethod=byrequests
	</VirtualHost>
</IfModule>

Now you will also need to enable mod_headers for the Header add operation to work.

In this case, the first worker (endpoint, say worker1) to handle the first request coming from some user X (strictly speaking, this would be a client; a browser, in most cases) would set the value of ROUTEID cookie to its own path, and subsequent requests from X would be routed exclusively to worker1, based on the already set value of ROUTEID cookie.

The above was a very brief introduction to load balancing with Apache. Combining this with other advanced features of mod_proxy and mod_proxy_balancer, as well as other modules of Apache, you will be able to set up a sophisticated site config on Apache quite easily.

No comments:

Post a Comment