(Last Updated On: August 12, 2017)

In this getting started with secure HAProxy on Linux blog post, I’ll walk you through important concepts you need to start working with HAProxy.

Load Balancing is a common argot for Webmasters and System Administrators managing huge-traffic websites. Since HAProxy is the de-facto standard open-source load balancer suited for such use cases, it’s the first Load balancer you’ll give a thought when in need of one. From the official site, HAProxy is claimed to be powering some of the world’s most visited websites.

In this blog post, I’ll take you through simple HAProxy setup on Docker to give you a clue on how HAProxy is deployed. My Lab is based on below diagram:

getting started with secure HAProxy on LinuxHAProxy Setup Diagram on Docker

Base OS used is CentOS 7.3.

I’ll configure HAProxy so that connections coming to it are forwarded to the backend servers. But before i go into configuration section, it’s good to cover few load balancing basics to refresh your mind.

What is Load Balancing?

Load balancing can be defined as a way of aggregating multiple components in order to achieve a total processing capacity above each component’s individual capacity, without any intervention from the end user and in a scalable way.

With Load balancing, more operations can be performed simultaneously. A layer 7 load balancer may perform a number of complex operations on the traffic such as:

  • decrypt SSL/TLS
  • reencrypt
  • parse
  • modify
  • match cookies
  • decide what server to send to

A load balancer may work at:

  • Link Level: This is link load balancing choosing what network link to send a packet to.
  • Network Level: This is called network load balancing. It consists in choosing route followed by a series of packets.
  • Server Level: This is called server load balancing. It consists in deciding what server will process a connection or request.

What is HAProxy

HAProxy is a TCP/HTTP reverse proxy which is particularly suited for high availability environments. From the official documentation, following statements are highlighted to define HAProxy:

  • TCP Proxy: Can accept a TCP connection from listening socket, connect to a server and attach these sockets together allowing traffic to flow in both directions
  • HTTP reverse-proxy: It presents itself as a server, receives HTTP requests over connections accepted on a listening TCP socket, and passes the requests from these connections to servers using different connections.
  • Server load balancing: Can load balance TCP connections and HTTP requests with decisions taken per request for HTTP mode and taken for the whole connection in case of TCP mode.
  • SSL terminator/initiator/offloader: Connections coming from client and connections going to the server may use SSL/TLS
  • TCP normalizer: since connections are locally terminated by the operating system, there is no relation between both sides, so abnormal traffic such as invalid packets, flag combinations, window advertisements, sequence numbers, incomplete connections (SYN floods), or so will not be passed to the other side. This protects fragile TCP stacks from protocol attacks, and also allows to optimize the connection parameters with the client without having to modify the servers’ TCP stack settings.
  • HTTP normalizer : when configured to process HTTP traffic, only valid complete requests are passed. This protects against a lot of protocol-based attacks. Additionally, protocol deviations for which there is a tolerance in the specification are fixed so that they don’t cause problem on the servers (eg: multiple-line headers).
  • HTTP fixing tool : it can modify / fix / add / remove / rewrite the URL or any request or response header. This helps fixing interoperability issues in complex environments.
  • Content-based switch : it can consider any element from the request to decide what server to pass the request or connection to. Thus it is possible to handle multiple protocols over a same port (eg: http, https, ssh).
  • Protection against DDoS and service abuse : it can maintain a wide number of statistics per IP address, URL, cookie, etc and detect when an abuse is happening, then take action (slow down the offenders, block them, send them to outdated contents, etc).
  • Observation point for network troubleshooting : due to the precision of the information reported in logs, it is often used to narrow down some network-related issues.
  • HTTP compression offloader : it can compress responses which were not compressed by the server, thus reducing the page load time for clients with poor connectivity or using high-latency, mobile networks.

HAProxy is not :

  • an explicit HTTP proxy, ie, the proxy that browsers use to reach the internet. There are excellent open-source software dedicated for this task, such as Squid. However HAProxy can be installed in front of such a proxy to provide load balancing and high availability.
  • a caching proxy : it will return as-is the contents its received from the server and will not interfere with any caching policy. There are excellent open-source software for this task such as Varnish. HAProxy can be installed in front of such a cache to provide SSL offloading, and scalability through smart load balancing.
  • a data scrubber : it will not modify the body of requests nor responses.
  • a web server : during startup, it isolates itself inside a chroot jail and drops its privileges, so that it will not perform any single file-system access once started. As such it cannot be turned into a web server. There are excellent open-source software for this such as Apache or Nginx, and HAProxy can be installed in front of them to provide load balancing and high availability.
  • a packet-based load balancer : it will not see IP packets nor UDP datagrams, will not perform NAT or even less DSR. These are tasks for lower layers. Some kernel-based components such as IPVS (Linux Virtual Server) already do this pretty well and complement perfectly with HAProxy.

Configuring HAProxy- Frontend server

Install HAProxy

# yum -y install haproxy

After successful installation of HAProxy, it’s time to configure. I’m providing haproxy.cfg pre-configured for reference on how it should generally look like. The configuration file for HAProxy should be placed in etc/haproxy/haproxy.cfg. You can do a backup of the default configuration file, this will allow for easy restore if anything goes wrong.

# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.bak

Then modify below blocks of configurations to your liking and put them in /etc/haproxy/haproxy.cfg:

global
	user haproxy                      # User to run haproxy
	group haproxy                     # haproxy default group
	log         127.0.0.1 local2 info # Logs level 
	chroot      /var/lib/haproxy      # Chroot home for haproxy user
	pidfile     /var/run/haproxy.pid  # PID file
	maxconn  300                      # Max number of conncections per process
	daemon                            # Run the process in the backgound

# Default settings used by 'listen' and 'backend' sections if not defined in their section
defaults
	mode	http        # haproxy running mode
	timeout	connect 20s # Timeout if no reply from backend servers
	timeout client  30s # Timeout on the client side
	timeout server  30s # Timeout on the server side
	timeout queue   1m  # Timeout for a queue
	log     global      # Use global setting for logs
	option  httplog     # get HTTP request log
	retries 3           # Allow max of three retries


# Frontend server settings
frontend input-traffic
	bind                *:80                              # Define port to listen on for incoming traffic
	option              forwardfor  except 127.0.0.0/8    # Send X-Forwarded-For header
	default_backend     backend_servers                   # Define default backend servers 

# Define backend servers
backend backend_servers
	balance	roundrobin                # Use roundrobin for load balancing 
	server	webserver1  192.168.2.11  # Web server 1
	server	webserver2  192.168.2.12  # Web server 2
	server	webserver3  192.168.2.13  # Web server 3
	http-request set-header X-Forwarded-Port %[dst_port]
	http-request add-header X-Forwarded-Proto https if { ssl_fc }

Start and enable haproxy

# systemctl start haproxy 
# systemctl enable haproxy 

Testing the setup

To test our setup configurations, we’ll use the three backend servers.

On web server 1. create sample web page:

# echo "This is web server 1" > /var/www/html/index.html

On web server 2. create sample web page:

# echo "This is web server 2" > /var/www/html/index.html

On web server 3. create sample web page:

# echo "This is web server 3" > /var/www/html/index.html

Access the frontend server from a client machine with http like below:

http://192.168.2.3

Frontend server

If you keep reloading the page, you should realize that the content you get keep changing in roundrobin manner.

Securing HAProxy with SSL

First install openssl package if you don’t have it already in your system:

# yum -y install openssl*

There are two ways to get SSL certificate. For non production use, you can sign certificate yourself like below:

Generating self-signed certificate

mkdir /etc/ssl/haproxy
cd /etc/ssl/haproxy
openssl req -x509 -nodes -newkey rsa:4096 -keyout haproxy.pem -out haproxy.pem -days 365 
chmod 600 haproxy.pem

Private key called haproxy.pem will be generated

Creating CSR

Keep in mind that for a production SSL, it’s advisable to just create a Certificate Signing Request (csr) and pass that to whomever you purchase a certificate from. Below commands are for generating csr.

mkdir /etc/ssl/haproxy
cd /etc/ssl/haproxy
openssl req -x509 -nodes -newkey rsa:4096 -keyout haproxy.pem -out haproxy.pem -days 365 
openssl req -new -key haproxy.pem -out haproxy.csr

Now configure HAProxy for SSL:

# vim /etc/haproxy/haproxy.cfg

Add below lines inside the global section

maxsslconn     300               # Max number of SSL connections per process
tune.ssl.default-dh-param 4096   # set 4096 bits for Diffie-Hellman key

In the frontend section, add:

bind *:443 ssl crt /etc/ssl/haproxy/haproxy.pem # Specify port to listen at and cert key.

Notice that the backend doesn’t really need to be configured in any particular way since SSL connections are terminated at the Load Balancer.

Using SSL only connections

If you’d like the site to be SSL, you can add redirect directive to the frontend section like below:

    redirect scheme https if !{ ssl_fc }

Logging in HAProxy

In this getting started with secure HAProxy on Linux, let’s look at Logging. Logging is an extremely important aspect of layer 7 load balancing . This is because once a trouble is reported, it is important to figure if the load balancer took took a wrong decision.