Readers Ask About ... Reverse Proxy Servers
Below is the 5th of 6 questions from a reader. I definitely don’t have all the answers, but I’m not short on opinions. ;)
Previous articles:
Part 1 – LVM+JFS+RAID | Part 2 – Service Virtualization |
Part 3 – Virtualizing Media Storage | Part 4 – Hosting Email
duijf asks:
Q5: Do I need a reverse proxy if I ? I’ve read about proxy servers on TheFu’s blog that filter internal traffic (if you read this, in the end I liked the idea a lot more than at first). Is this even the same thing? If this is to happen, is it correct that I’d need two NICs and bridge the connection from the router to the internal network? If so, can I get rid of the router? We do use it for telephone access too.
What is a Proxy Server?
A proxy server is a middleman. It can be an outbound proxy like many businesses deploy to protect their internal systems from nefarious downloads or to block pages viewed with undesirable content. Proxy servers are usually put in place to filter content in some way. A parent may install a proxy filter on their home network to protect their children from bad content, whatever that means.
When a proxy server is deployed, usually all traffic of a specific type, needs to be blocked or redirected to that proxy server or there isn’t much reason for end users to bother. So if you deploy a proxy web server in a home or work network, then all HTTP/HTTPS traffic trying to get to the internet needs to be forced to that proxy server. 100% of it. Squid is a commonly deployed proxy server for home and business users.
Proxy servers have another purpose. They often provide a local cache of content that multiple users are viewing. In a company, this may be extremely important. Suppose the company President sends an email to the entire 2,000 employees with a link to an external website that contains photos or video. If there is a proxy server, then the web page will be pulled into the proxy cache and served to end users from that internal cache, not using internet bandwidth. Most companies do not have huge internet connections, but they do have pretty good internal LAN bandwidth capacity. Minimizing this external traffic is very important. A proxy server will make the end users happier, filter content that the company believes is undesirable (content or viruses), reduce internet bandwidth required, and protect users from certain types of malware.
In general, I like proxy servers. You can learn much more all over the web. Wikipedia usually does a good job explaining this stuff.
What is a Reverse Proxy?
A reverse proxy is useful when you run a server that people on the internet will access. It does all the things that a normal, web proxy does, just for inbound requests from outside people. A pure reverse web proxy server would look at all inbound web traffic requests and validate them for reasonableness. My reverse proxy settings dump bad requests (you would be amazed at the crap requests hitting an internet web server) or requests to internal-only web pages that should never be available from outside the internal LAN. If a bad requests never gets to the web or application server, then it can’t hack it as easily. To further enhance my security, the actual web servers used for public content don’t run on the normal ports. Port 80 is the normal, default port for HTTP traffic. My router forwards all port 80 traffic to a reverse proxy also listening on port 80. Then the reverse proxy looks at the requested page and decides which back-end server should handle the request. I have organized my internal services by both domain name and subdirectory. So
- pda.jdpfu.com
- blog.jdpfu.com and
- www.jdpfu.com
are directed to different back-end servers.
Some internal-only services are redirected based on the top level directory specified.
- zyz.jdpfu.com/wiki
- zyz.jdpfu.com/crm
- zyz.jdpfu.com/email
- zyz.jdpfu.com/pm
- zyz.jdpfu.com/sip
each go to different services running on internal systems.
There are lots of different ways to accomplish this, usually by opening different ports on the firewall and forwarding those to the different back end servers. Doing that means touching the router and it puts each of those servers directly on the internet, which may not be desired.
I’ve used pound, mod_proxy (apache) and nginx as reverse proxies. Today, I’m using nginx here.
- Pound was selected because it is extremely light weight, efficient, based on Perl and had a pretty good reputation at some very large websites, like slashdot.org which sees millions of users daily. Pound is missing some features that became more important to me. I did use it to filter inbound requests, drop clearly bad requests and as a load balancer when I needed multiple back end servers, but didn’t want users to have to deal with that on any level.
- mod_proxy from Apache is the default proxy simply because most people already have apache running as their web server. Apache brings everything to a web or app server, hence it can be pretty heavy. In my mind, the more features that a tool has, at some point that complexity becomes detrimental to security.
- Nginx is like pound, but written in C/C++ for high performance and light weight. I saw nginx many years ago when I elected to use pound instead. Part of the reason I didn’t choose nginx was due to the main developer being in Russia. That may not be fair, but it was part of my decision. These days, enough reputable, very large, web sites use nginx that I’m comfortable trusting this particular development team. It brings some pretty nifty features that pound was lacking, like built in compression for web traffic. Using compression usually lets a given pipe/connection support 2 to 3 times more traffic for zero additional cost. How can you turn that down?
Load Balancing
There are many different ways to support lots and lots of different user requests. Getting a bigger, faster machine isn’t always viable, so throwing more smaller machines at the problem is often the best solution. That includes throwing virtual servers at the problem. Now you need a way for an end user to access different back-end servers without knowing anything about it.
If you are a small web content company, then putting in a load balancer is usually the easiest answer. You can have 1 load balancer and 1 or 100 or 10,000 back-end servers pushing content. To the end-user, it appears as 1 web server. That’s what you want.
Load balancers can be network devices from Cisco, Nortel, Juniper, F5 or they can be software from any different providers – many are F/LOSS. Pound and nginx are the easiest, but you can make your own if you are knowledgeable about Linux IP stack commands. I use nginx.
SSL Connection Server
SSL connections are almost always tied to the IP address on a machine. If you are small, you probably want to limit the number of expenses involved in running web properties. That includes buying multiple SSL certificates every year. With a reverse proxy, you can install a single SSL certificate for a specific web service and proxy those requests to many different back-end servers. If you are small, you may only have 1 public-facing IP address too. This is another reason to install your SSL cert on the reverse proxy and not on the different back-end servers on your internal network. Obviously, once the traffic gets to the reverse-proxy server, then it becomes unencrypted on your internal network. That can be problematic if you need to deal with financial transactions or sensitive data and don’t have physical control over the network – like none of us do at our web hosting providers.
Physical Setup for a Proxy
Do you need 2 physical network cards to have a proxy server? No. Most of the time, the internal users will make their proxy settings either manually or from a .PAC file that you deploy for them. Then the router will prevent outside connections except from the proxy server IP. This prevents end user IPs from getting onto the internet without going through the proxy.
For an inbound proxy server a single network card can be deployed as well. You setup the router to forward the specific ports from outside to the specific reverse proxy server. On the internal servers, you probably want to do two things:
- setup firewall rules to block all non-local LAN connections
- proxy to non-standard ports from the reverse proxy to the back-end server(s).
Doing these things, you further ensure that the only way an external request get to the back-end server is through your reverse proxy server.
As you can see, there’s no need for multiple network interfaces on proxy servers. If you did put one in and connected it to 2 different networks, the your proxy server would be called a …. router. I can’t recommend setting up a proxy server, then adding routing to it. Routers are the main place for security on most networks. When it comes to security, I prefer 1 job for 1 device. Keep It Simple, KISS. OTOH, I might consider adding a proxy to an existing router/firewall.
Do you use it for telephone access? Probably not. I’m assuming when you say telephone that you really mean SIP. SIP is extremely sensitive to flutter and delays. Sure, it may work, but I wouldn’t put a SIP device behind any proxy if I could avoid it. Your specific SIP devices and servers and service provider may support this. I dunno.
Complex Stuff
I hope my explanations were clear. This is complex stuff if you’ve never heard about it previously. Ask questions in the comment if I wasn’t clear.