Configuring LINUX and Squid as a Web Proxy
by David Del Elson
|Configuring LINUX and Squid as a Web Proxy
last updated May 30, 2001
A web proxy server is a useful service to have on your network, or between your network and the Internet, as it provides an extra security layer that insulates your users from the Internet. A proxy server can also act as a cache, allowing users to share downloads transparently and speeding up Internet access, especially for frequently-used files. Squid is a high-performance and relatively secure web proxy server that includes good caching facilities. It is one of the most commonly used proxy servers on the Internet. More information about Squid can be obtained http://www.squid-cache.org/. This article will give a general overview of setting up Linux and Squid as a web proxy server.
The best way to install Linux is to use one of the many freely available distributions. Red Hat Linux and Debian are two of the more popular distributions of Linux. Each distribution of Linux will come with its own installation instructions, usually packaged with the distribution or available on the Internet. For example, the Red Hat Linux installation instructions for version 7.1 are available here. You may also want to look at a previous SecurityFocus article, Installing Linux, by Peter Merrick, includes some recommendations on system hardening that you may want to think about before installing squid.
Installing From Package
Note that your Linux distribution will usually come bundled with a packaged version of Squid; however, it may not be installed at the time the distribution was installed. For example, after installing Red Hat Linux 7.1, you will find that the Squid package is not installed. Squid is located on the Red Hat installation CD #2, in the RedHat/RPMS directory. To install it from there, make sure that you are logged in as root, and use the rpm command as follows:
mount /mnt/cdrom rpm -Uhv /mnt/cdrom/RedHat/RPMS/squid-*.rpm umount /mnt/cdrom
During the installation process, you should see a row of hashes (#) to indicate the progress of the installation.
On a Debian Linux system, you can use the apt-get program to automatically download and install squid from the Internet, as follows:
apt-get install squid
Note that if you are not connected to the Internet, the above command will fail. You may instead want to install squid from a Debian CD-ROM. Installation instructions from CD-ROM may vary, and so you should check with the person who supplied your CD-ROM.
Installing From Source
If you prefer to install Squid from the source files, then you can do this on just about any Unix system. First, you will need to obtain the latest source code from the Squid web site, at http://www.squid-cache.org/.
The Squid source code comes in a compressed tar file, so you will need to uncompress it as follows:
zcat squid-2.3.STABLE4-src.tar.gz | tar xf -
(note: to do this, I obtained the 2.3.STABLE4 release of Squid from the Squid web site. You may have a different release of Squid, and so may need to adjust the above command.
Once you have uncompressed the tar file, you will need to configure, make, and install Squid as follows:
cd squid-2.3.STABLE4 ./configure make all make install
For further information on installing squid, read the INSTALL file which is provided with the Squid source code. You may wish to provide some options to the above ./configure command to specify the location of the squid programs, configuration files, etc.
Everything in Squid is configured using a single configuration file, called squid.conf. Depending on your Linux distribution, the file may be in /etc/squid.conf or in /etc/squid/squid.conf. Before proceeding any further, you should locate this file on your system. One way to do this is with the command:
There are a number of methods of configuring squid using a web based or other GUI. These GUIs have the ability to read, understand, and write back the configuration file to the correct place.
Instead of focusing on one of these GUIs, I will show you some examples of configuring Squid manually. For this purpose, you will need a text editor such as vi or emacs (or even a GUI based editor such as kedit if you prefer), and you will also need to be logged in on your server as root so that you have write access to the Squid configuration file.
The Squid configuration file contains many, many options. I will not cover all of these options (there are comments throughout the file as to what these options mean), but I will focus on getting some of the most common options correct.
By default, Squid comes with a configuration file that is mostly correct and almost useable. It contains default settings for many of the options that require a setting, and should, by itself, allow access to your Squid configuration in a fairly secure manner from your local server only.
In order to allow Squid to be used as a proxy server for your entire network, there are a number of things that you will want to configure before you begin using Squid.
Starting Point (Basic) Configuration
When I began using Squid, I found that most of the comments in the squid.conf file were useful and informative. These days, however, I have developed a bit of a habit of deleting all of them (including the blank lines) before I begin. This reduces Squid's 76K default configuration file as supplied on Red Hat 7.1 to 688 bytes! I find that I only use a few of the configuration items in this file, and the smaller file is much easier to work with in an editor.
To the basic (as-supplied) squid configuration file, I add the following options:
acl privatenet src 192.168.0.0/255.255.0.0 http_access allow privatenet cache_effective_user squid cache_effective_group squid
There are a few things to make note of regarding these options:
Note that the above configuration file entries only provide a small part of what you may want to do with your Squid proxy. Some other examples are noted in this section.
By default, Squid stores some information in a few log files. I prefer to specify the log files that I expect Squid to use directly in the squid.conf file, as follows:
cache_access_log /var/log/squid/access.log cache_log /var/log/squid/cache.log cache_store_log none
With the above lines, Squid will store error messages in the file /var/log/squid/cache.log (this should be checked periodically), and access messages in the file /var/log/squid/access.log. There are a number of useful programs that can analyse the access log file, including SARG (formerly known as sqmgrlog).
I have never found anything useful in squid's cache_store_log file, so this can be disabled safely by using the line above.
You may want to allow access to your cache from a number of networks. This is accomplished by using various acl and http_access lines.
Note that an acl line defines a network or other access device, whereas the http_access (acl) (allow/deny) line grants or denies access to the acl that you have defined. Therefore, you should put your acl lines before the http_access lines in your configuration file.
I have given one example of allowing access to a private network above. Note that you should refrain from using a catch-all line like http_access allow all unless you really want the entire Internet using your squid Server as their web cache!
Talking to an External (Upstream) Proxy
It may be advantageous to use an upstream proxy for Squid. This can speed Internet access up noticeably; for example, when your ISP also has a Squid cache that many users access. The ISP's cache can, over time, build up a large cache of many different sites, allowing faster access to those sites to your network.
For inter-cache communication, Squid supports a protocol known as 'ICP'. ICP allows caches to communicate to each other using fast UDP packets, sending copies of small cached files to each other within a single UDP packet if they are available. Many other cache products also support ICP, and if you are going to network caches together then you should ensure that they all support ICP or a similar protocol.
To use an upstream proxy effectively, you should first determine what address it is (eg: proxyserver.yourisp.com), and what cache and ICP port (if any) it uses. Most ISPs will be happy to provide you with that information from their web sites or over the phone.
Using an upstream proxy that supports ICP is simple, using a line like this one:
cache_peer proxy.yourisp.com parent 3128 3130 prefer_direct off
The cache_peer line specifies the host name, the cache type ("parent"), the proxy port (3128) and the ICP port (in this case, the default, which is 3130).
If your parent cache does not support ICP then you could try the following combination instead:
cache_peer proxy.yourisp.com parent 3128 7 no-query default prefer_direct off
Sibling Proxies and Sharing Caches
Note that in a high-volume situation, or a company with several connections to the Internet, Squid supports a multi-parent, multi-sibling hierarchy of caches, provided that all of the caches support ICP. For example, your company may operate two caches, each with their own Internet connection but sharing a common network backbone. Each cache could have a cache_peer line in the configuration file such as:
cache_peer theotherproxy.yournetwork.com sibling 3128 3130
Note that the peer specification has changed to sibling, which means that we will fetch files from the other cache if they are present there, otherwise we will use our own Internet connection.
Denying Bad Files
There are a number of files that I don't allow my users to fetch, including the notorious WIN-BUGFIX.EXE file that was distributed with the Melissa virus. A simple ACL line to stop this file from being downloaded is as follows:
acl nastyfile dstdom_regex -i WIN[.*]BUG[.*]EXE http_access deny nastyfile
There are a number of other tricks that you can do with your Squid proxy. These include things like authentication, transparent proxying, denying access to certain files (eg: MP3 files) during business hours, etc. One word of warning: the Squid configuration file is fragile, and easily broken. If you break the configuration file then Squid will refuse to work, and may give you an error message that is not sufficiently understandable for you to figure out what you break. For that reason it might be advisable to:
Authenticating users to squid is one of the most common tasks that is required of administrators, for example, where your company grants or denies internet access by user.
Setting up an acl to allow or deny user access can be done with the following configuration file lines:
authenticate_program /your/authentication/program acl validusers proxy_auth REQUIRED http_access allow validusers
The only thing remaining is to find a suitable proxy authentication program. Note that squid does not provide any internal authentication, you have to point the authenticate_program line at an external authentication program of some kind.
Squid (as supplied on Red Hat 7.1) comes with a number of authentication programs, stored in /usr/lib/squid. These include smb_auth (for authenticating to an NT domain), squid_ldap_auth (for authenticating to an LDAP directory), and my preferred candidate which is pam_auth, which uses the system PAM libraries to authenticate users. The advantage of using pam_auth is that you can configure PAM to authenticate users through a variety of methods, and have the entire system and all programs on it (including the login program, XDM, Squid, Apache, and others) all using the same authentication configuration information and server.
To configure pam_auth, you will need to set up the following (note that this is for Red Hat Linux, instructions may vary for Debian):
Transparent proxying is a method whereby you can put a proxy server between your network and the Internet, and have all WWW accesses directed to the proxy server automatically (note that this works for WWW but not for FTP). The user must be aware that transparent proxying and authentication are incompatible. They cannot both be done on the same server. If you were to try it, it might look like it is working but it is not. If you must use authentication, then don't try transparent proxying.
To set up transparent proxying, you need two things:
To set up your firewall rule, you will need a rule such as the following (which is for ipchains):
/sbin/ipchains -A input -p tcp -s 0/0 -d 0/0 80 -j REDIRECT 3128
For iptables (Linux Kernel 2.4 and later) users, you may like to set up an iptables-based firewall on your squid server. As part of the firewall, you will need to create a DNAT rule mapping outgoing traffic on port 80 to port 3128 of the proxy server. Some programs that provide a GUI interface to iptables are discussed in A Comparison of iptables Automation Tools by Anton Chuvakin and there is also a netfilter home page where you can find some documentation and a HOWTO with some more detailed instructions on setting up NAT rules.
The required Squid configuration lines to allow Squid to act as a transparent proxy are as follows:
httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
Before Exposing Your Server to the Internet
Before exposing your server to the Internet, you should ensure that all unwanted services are turned off or disabled, that a secure firewall is in place, and that you have ensured that some level of monitoring is in place to detect and prevent intrusion. Previous SecurityFocus articles, such as securing Linux part 1 and part 2 provide more information on this topic.
As with any server that is connected to the Internet, you may wish to have some kind of ongoing monitoring performed. A couple of useful programs for doing this include logcheck (mentioned in Securing Linux part 2, listed above), and AIDE, which is covered in Securing Linux with AIDE by Kristy Westphal.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.