Compiling Squid

Compiling Squid is quite easy: you need the right tools to do the job, though. First, let's go through getting the tools, then you can extract the source code package, include optional Squid components (using the configure command) and then actually compile the distributed code into a binary format.

A word of warning, though: this is the stage where most people run into problems. If you haven't compiled source before, try and follow the next section in order - it shouldn't be too bad. If you don't manage to get Squid running, at least you have gained experience.

Compilation Tools

All GNU utilities mentioned below are avaliable via FTP from the official GNU ftp site or one of it's mirrors. A list of mirrors is available at http://www.gnu.org/, or download them directly from ftp://ftp.gnu.org/.

The GNU compiler is only distributed as source (creating a chicken-and-egg problem if you do not have a compiler) you may have to do an Internet search (using one of the standard search engines) to try and find a binary copy of the GNU compiler for your system. The Squid source is distributed in compressed form. First a standard tar file is created. This file is then compressed with the GNU gzip program. To decompress this file you need a copy of gzip. GCC (The Gnu C Compiler) is the recommended compiler: the developers wrote Squid with it, and it is available for almost all systems.

You will also need the make program, of which there is also a GNU version easily available.

If possible, install a C debugger: the GNU debugger (GDB) is available for most platforms. Though a debugger is not necessary for installation, but is very useful in the case of software bugs (as discussed in chapter 13).

Unpacking the Source Archive

Earlier we looked at the tree structure of the /usr/local/squid directory. I suggest extracting the Squid source to the /usr/local/squid/src directory. So, create the directory and copy the downloaded Squid tar.gz file into it.

First let's decompress the file. Some versions of tar can decompress the file in one step, but for compatability's sake we are going to do it in two steps. Decompress the tar file by running gzip -dv squid-version.tar.gz. If all has gone well you should have a file called squid-version.tar in the current directory. To get the files out of the "tarball", run tar xvf squid-version.tar.

Tar automatically puts the files into a subdirectory: something like squid-2.1.PRE2. Change into the extracted directory, and we can start configuring the Squid source.

Compilation options

Squid features are enabled (or disabled) with the configure shell script. Some Squid features have to be specifically enabled when Squid is compiled, which can mean that you have to recompile at a later stage. There are two reasons that a feature can be disabled by default:

Operating system Compatibility. Although Squid is written in as generic a way possible, certain functions (such as async-io, transparency and ARP-based access control lists) are not available on all operating systems. When many operating systems cannot use a feature, it is included as a compile time option.

Efficiency. On a very lightly loaded cache, async-io can actually slow down requests minutely. Some system administrators may wish to disable certain features to speed up their caches.

You may be wondering why there simply aren't config file options for these less used features. For most of the features there really isn't a reason other than (?minimalisim?). Why have code sitting in the executable that isn't actually used? You can include the features that you might use at some time in the future without detrimental effects (other than a slightly larger binary), so as to avoid having to recompile the Squid source later on.

The configure program also has a second function: with some source code you have to edit a header file which tell the compiler which function calls to use on the system. This very often makes source compilation difficult. With Squid, however, the GNU configure script checks what programs, libraries and function calls are available on your system. This simplifies setup dramatically.

To make configure as generic as possible, it's actually a Bourne Shell /bin/sh script. If you have replaced your /bin/sh shell with a less Posix-capable shell (like ash) you may not be able to run configure. If this is the case you will have to change the first line of the configure script to run the full shell.

all source inclusion options are set with the command './configure option'. On most systems root doesn't have a '.' in their search path for security reasons, so you have to fully specify the path to the binary (hence the '/').

To turn more than one configuration option on at once you simply append each option to the end of the command line. You can, for example, change the prefix install directory and turn Async-IO on with a command like the following (more on what each of these options is for shortly).

 ./configure --prefix=/usr/people/staff/oskar/squid --enable-async-io

Note that only the commonly used configuration options are included here. To get a complete list of options you can run './configure --help'. Many of the resulting options are standard to the GNU configure script that Squid uses, and are used for some things like cross compilation.

If you wish to find out about some of the more obscure options you may have to ask someone on one of the relevant mailing lists, or even read the source code!

Reducing output of configure

When you run configure you normally get a fairly verbose output as to what is being checked for. Most people don't need all this information, so there is an option to stop configure printing the messages that aren't important. To reduce the amount of printed output, use the --quiet option. This way you will only see error messages, not debug information.

The first time you run configure you should run it in verbose mode. The configure process can take a while on slower machines, so you should get an idea as to how long it will take to run. Should you need to submit a bug report, you should always include as much information as possible, and should include the full configure output.

Destination directory

Some system administrators would prefer to dispense with the /usr/local/squid directory described earlier. On some systems you may even be installing Squid on a machine where you do not have root access (and can thus not create the /usr/local/squid directory). In either of these cases you will need to change your destination path.

Throughout this book I assume that you have installed Squid in the default directory. Using the default destination will make it easier for you to follow the examples in this book.

Changing the destination directory is done with the --prefix configure option. Here are some examples where we use this option.

Installing Squid in your home directory:

 ./configure --prefix=/usr/people/staff/oskar/squid

If you are installing Squid on a dedicated cache machine you may wish to place all Squid-related files in the /usr/local directory. Config files (for example) will thus live in /usr/local/etc.

 ./configure --prefix=/usr/local/

Using the DL-Malloc Library

The memory allocation routines included with many operating systems aren't very good for the way that Squid allocates and uses memory. Squid uses the memory subsystem more intensively than most programs, since it's a single process which runs for an extended period of time and continuously allocates and frees small sections of memory. On some systems the Squid process size increases at a rapid rate. When it eventually consumes all the memory on the system, it crashes.

This option enables a different system memory allocator: DL-Malloc, by Doug Lea, which is known to be efficient for Squid's allocation patterns.

Squid will increase in size as objects are added to the disk cache, as discussed in the Hardware Requirements section. The index of objects in the disk cache takes up RAM, so make sure that you have sufficient memory in your system before deciding that the memory allocation system is at fault.

If a recently started copy of Squid uses substantially less memory than one that has been running for a few days (with the same size cache store), you might want to configure Squid to use DL-Malloc.

The included DL-Malloc memory allocation routines are not thread-safe, so you may not be able to use this option in conjunction with Async-IO. (? need to check details ?)

To use DL-Malloc, simply use the --enable-dl-malloc option:

 ./configure --enable-dl-malloc

Regular expression routines

Regular expressions allow you to do complex string matching, and are used for various things in the Squid config files (most notably in the rules that control how long objects stay in the cache).

On some systems you may wish to replace the default regular-expression routines with the GNU routines. This may be because the default operating system ones are incompatible with Squid or do not function correctly. If your system doesn't have regular expression libraries, Squid will automatically use the GNU library, so the GNU regular expression routines are included in the default Squid source code tree, and don't have to be downloaded seperately.

To enable use of the GNU libraries, simply use the --enable-gnuregex configure option.

Asynchronous IO

Squid 2.0 includes a major performance increase in the form of Async-IO.

It's important to remember that Squid is one processes. In many Internet daemons, more than one copy runs at a time, so if one process is by a system call, it does not effect the other running copies.

Squid is only one process. If the main loop stops running for some reason, all connections are slowed. In all versions of Squid, the main loop uses the select and poll system calls to decide which connections to service. As Squid receives data from the server, it writes the data to disk and to the client.

To write data to disk, a file has to be opened on the cache drive. When lots of clients are opening and closing connections to a busy cache, the main loop has to make lots of calls to open and close network and disk filehandles (note that the word filehandle can refer to both a network connection and an on-disk file). These two functions block the flow of all data through the cache. While waiting for open to return, Squid cannot perform any other functions.

When you enable Async-IO, Squid 2.0 uses threads to open and close filedescriptors. A thread is part of the main Squid program in most ways, except that if it makes use of a blocking system call (such as open), only the thread stops, not the main loop or other threads. Note that there is not one thread per connection.

Using threads to make calls to blocking function calls reduces the latency that a cache adds to each request. (People sometimes worry about the latency that caches add, but if you have a fast enough cache the latency is not an issue - the client sees no noticeable overhead. Network overhead normally outweighs Squid overhead). Async-IO drastically reduces cache overhead when you have a loaded cache.

Unfortunately Posix threads aren't available on all operating systems. This ties your hardware choice into your choice of operating system, since if your operating system does not support threads there may be no choice but to use a faster system, or even to split the load between multiple machines. (? need a table of machines that work ?)

You should probably try and run Squid with Async-IO enabled if you have a few thousand requests per hour. Some systems only support threads properly with a fair amount of initial setup. If your load is low and Async-IO doesn't work straight away you can leave Squid in the default configuration.

Use the --enable-async-io configure option to include the async-io code into Squid.

User Agent logging

Most modern browsers include a header with each outgoing request that includes some basic information about the user's browser and operating system. This header is called a 'user-agent' header, since it describes the agent program (the browser) used. An automated agent includes different user-agent headers, so logging user-agent headers allow you to see if someone using an automated web fetcher program (commonly referred to as a spider) to fetch pages on their behalf. It can also be used to find statistics as to the most commonly used browsers. The captured information is written to a log file specified in the configuration file. To include the code responsible for logging this information into the Squid binary, use the --enable-useragent-log option to configure.

Simple Network Monitoring Protocol (SNMP)

Enabling the Simple Network Monitoring Protocol (SNMP) allows you to query your cache machine with one of the many SNMP tools available. If you have an existing SNMP monitoring system, you should be able to use your existing software to monitor Squid performance and retrieve usage information. This is discussed in detail in Chapter 6.

Some tools will read the Squid MIB (? what does this stand for ?) included with Squid (as /usr/local/squid/etc/mib.txt, once Squid is installed). Some tools, on the other hand, will have to be patched to understand the MIB that Squid uses. Since most SNMP products are written with a router in mind, they may not talk to an application like Squid, since the Squid MIB is quite different from a router MIB. (For more information on Squid and SNMP, see chapter 11)

Use the --enable-snmp configure option to enable the Squid SNMP code.

Killing the parent process on exit

Since Squid will be a very important part of your network when it is installed, you will probably have a program which simply restarts Squid if the running process exits. The RunCache program included with Squid does just this.

If you are doing maintenance on the cache system and actually wanted to kill the Squid process, having it automatically restarted as you work can be irritating, or even cause real problems.

This option puts code into Squid that kills the parent process if Squid is shutdown cleanly. If Squid crashes it leaves the parent process alone, and will this be automatically restarted.

Use the --enable-kill-parent-hack to enable killing of the parent process on exit.

If you don't use this option, the correct procedure is to kill the parent with the kill command, and to then use the shutdown command described in the Running Squid section to shutdown Squid. Do not use the 'kill' command if you can avoid it: Squid needs time to shut down cleanly, since it writes a complete list of objects to disk).

Reducing time system-calls

When writing logs of cache events and client accesses, Squid calls the gettimeofday() operating system call to determine the accurate time.

This system call can take a short while to return, leaving Squid doing nothing while while it could be reading and writing data for something that doesn't require logging. The amount of time that Squid takes to make the system call is negligible on most machines, but under very high load the huge number of calls can impact overall performance. Enabling the 'time-hack' option makes Squid update the clock only once per second, reducing the overhead dramatically on such caches. This does means that your log messages are less accurate. The log accuracy is important to some people, though. When you have accurate time stamps of how long transfers take, you can create graphs of response time, and use them to decide when you need to upgrade your machine. (More on this in chapter 11: Cache analysis).

Most people do not need to use the --enable-time-hack option. It's useful mainly on very slow machines, or on operating systems where the gettimeofday call is very slow.

ARP-based Access Control Lists

All ethernet cards have a (supposedly) unique identifier which is used as an address for all network traffic destined for that card. This number is referred to as a MAC address. If the card didn't have this address the operating system would have to check every packet on the network and decide if the packet was destined for it's IP address. With ethernet, however, the card's internal optimized hardware can check all the packets and decide if the packet needs to be passed up to the operating system. The network protocol that associates MAC addresses with IP numbers is known as ARP (Address Resolution Protocol).

If you want to control cache access by MAC address, you can enable ARP access control lists.

This option is only available on certain operating systems, since there is no standard method of finding the ARP address of a host when you are connected at the TCP level. As of this writing, ARP acl lists only work on Linux. If you are an operating system that can return this information to a user-level process, use the --enable-arp-acl option to use MAC acls.

Inter-cache Communication

Squid includes multiple Inter-Cache communication protocols. By default, the original Inter-Cache protocol (ICP) is included in the source code. If you wish to include some of the less used protocols, you will need to include them at compile time. Inter-cache communication is covered in depth in chapter 8. For the initial install you should probably not enable these protocols, since they may not be used.

If you are planning on joining an existing hierarchy you should ask the hierarchy administrators as to what protocols are supported or needed. If you are setting up a new hierarchy then you should only enable these after reading the above referenced chapter.

You cna enable the cache-digests with the --enable-cache-digests option, and the Hyper Text Caching Protocol (HTCP) with --enable-htcp.

Keeping track of origin request hosts

(? I have never used this function. I think that it may be used mainly by the NLANR caches. I need to find out exactly what this is used for. This is my 'best guess' in the meantime. ?)

When Squid caches forward requests on to a destination server (or, in fact, to a parent cache) it adds headers to the request indicating both the origin IP of the requester and the IP address of the cache that is doing the forwarding (it's own IP). Squid can be configured to keep track of both of these headers for access logs of incoming requests. If you have caches beneath yours, this logs the headers the client caches add.

This feature is only really useful if you are at the top of a hierarchy and want to see who the biggest users of lower caches are. Currently, you can only access the data stored in this way with the cachemgr.cgi cgi program. (? not sure ?).

You probably don't want to enable this option, but if you do, use the --enable-forw-via-db option.

Language selection

When Squid is unable to fulfill a request, an error page is returned to the user with information on what went wrong. This page can be in the language of your choice. Squid already includes error pages in quite a number of languages: for list of included languages, check the contents of the directory errors/ in the extracted source directory.

cache:~/src/squid-2.0.RELEASE> ls errors/
Bulgarian       Estonian        Italian         Russian-1251    list
Czech           French          Makefile.in     Russian-koi8-r
Dutch           German          Polish          Spanish
English         Hungarian       Portuguese      Turkish

The file 'list' contains a list of files to edit, when creating your own language error files.

Unfortunately there are not versions of the config file in different languages - only the error messages returned to users have been translated. The language defaults to English if you do not specify a language.

To use a specific language, replace language-name in the below text with something like Bulgarian. enable-err-language=language-name

Running configure

Now that you have decided which options to use, it's time to run configure. Here's an example:

 ./configure --enable-err-language=Bulgarian --prefix=/usr/local

Running ./configure with the options that you have chosen should go smoothly. In the unlikely event that configure returns with an error message, here are some suggestions that may help.

Broken compilers

The most common problem for new installers is that there is a problem with the installed compiler (or the headers) for the system.

To test this theory simply try and run configure with no options at all. If you still get an error message it is almost certainly a compiler or header file problem.

To make sure try and compile a program that uses some of the less used system calls and see if this compiles.

If your compiler doesn't compile files correctly, you might want to check if he header files exist, and if they do, permissions on the directory and the include files themselves.

If you have installed GCC in a non-standard directory, or if you are cross compiling, you may need configure to append options to the GCC command it uses during it's tests. You can get configure to append options to the GCC command line by setting the 'CFLAGS' shell variable prior to running configure. If, for example, you compiler only works when you you modify the default include directory, you can get configure to append that option to the default command line with a (Bourne Shell) command like:

CFLAGS=-I/usr/people/staff/oskar/gcc/include
export CFLAGS

Incompatible Options

Some configuration options exclude the use of others. This is another common cause of problems. To test this you should just try and run configure without any options at all, and see if the problem disappears. If so, you can try and work out which option is causing the conflict by adding each option to the configure command line one-by-one. You may find that you have to choose between two options (for example Async-IO and the DL-Malloc routines). In this case you may have to decide which of the options is the most important in your setup.

Compiling the Squid Source

Now that you have configured Squid, you need to make the Squid binaries. You should simply have to run make in the extracted source directory, and a binary will be created as src/squid.

cache:/ # cd /usr/local/squid/src/squid-2.2.RELEASE
cache:/usr/local/squid/src/squid-2.2.RELEASE # make

If the compilation fails, it may be because of conflicting configure options as described in the configure section. Follow the same instructions described there to find the offending option. (You should run make clean between configure runs, to ensure that old binaries are removed) As a start, try running configure without any options at all and then see if make completes. If this works, try additional configure options one at a time to see which one causes the problem.

Installing the Squid binary

The make command creates the binary, but doesn't install it.

Running make install creates the /usr/local/squid/bin and /usr/local/squid/etc subdirectories, and copies the binaries and default config files in the appropriate directories. Permissions may not be set correctly, but we will work through all created directories and set them up correctly shortly.

This command also copies the relevant config files into the default directories. The standard config file included with the source is placed in the etc subdirectory, as are the mime.types file and the default Squid MIB file (squid.mib).

If you are upgrading (or reinstalling), make install will overwrite binary files in the bin directory, but will not overwrite your painfully manipulated configuration files. If the destination configuration file exists, make install will instead create a file called filename.default. This allows you to check if useful options have been added by comparing config files.

If all has gone well you should have a fully installed (but unconfigured) Squid system setup.

Congratulations!