Setting Squid's HTTP Port

The first option in the squid.conf file sets the HTTP port(s) that Squid will listen to for incoming requests.

Network services listen on particular ports. Ports below 1024 can only be used by the system administrator, and are used by programs that provide basic Internet services: SMTP, POP, DNS and HTTP (web). Ports above 1024 are used for untrusted services (where a service does not run as administrator), and for transient connections, such as outgoing data requests.

Typically, web servers listen for incoming web requests (using the HyperText Transfer Protocol - HTTP) on port 80.

Squid's default HTTP port is 3129. Many people run their cache servers on a port which is easier to remember: something like 80 or 8080). If you choose a low-numbered port, you will have to start Squid as root (otherwise you are considered untrusted, and you will not be able to start Squid. Many ISPs use port 8080, making it an accepted pseudo-standard.

If you wish, you can use multiple ports appending a second port number to the http_port variable. Here is an example:

http_port 3128 8080

It is very important to refer to your cache server with a generic DNS name. Simply because you only have one server now does not mean that you should not plan for the future. It is a good idea to setup a DNS hostname for your proxy server. Do this right away! A simple DNS entry can save many hours further down the line. Configuring client machines to access the cache server by IP address is asking for a long, painful transition down the road. Generally people add a hostname like cache.mydomain.com to the DNS. Other people prefer the name proxy, and create a name like proxy.mydomain.com.

Using Port 80

HTTP defines the format of both the request for information and the format of the server response. The basic aspects of the protocol are quite straight forward: a client (such as your browser) connects to port 80 and asks for the file by supplying the full path and filename that it wishes to download. The client also specifies the version of the HTTP protocol it wishes to use for the retrieval.

With a proxy request the format is only a little different. The client specifies the whole URL instead of just the path to the file. The proxy server then connects to the web server specified in the URL, and sends a normal HTTP request for the page. (? The format of HTTP requests is described in more detail in chapter 4, where you type in an HTTP request, just as a browser would send it to test that the cache is responding to requests - may use the 'client' program instead.?)

Since the format of proxy requests is so similar to a normal HTTP request, it is not especially surprising that many web servers can function as proxy servers too. Changing a web server program to function as a proxy normally involves comparatively small changes to the code, especially if the code is written in a modular manner - as is the Apache web server. In many cases the resulting server is not as fast, or as configurable, as a dedicated cache server can be.

The CERN web server httpd was the first widely available web proxy server. The whole WWW system was initially created to give people easy access to CERN data, and CERN HTTPD was thus the de-facto test-bed for new additions to the initial informal HTTP specification. Most (and certainly at one stage all) of the early web sites ran the CERN server. Many system administrators who wanted a proxy server simply used their standard CERN web server (listening on port 80) as their proxy server, since it could function as one. It is easy for the web server to distinguish a web site request from a normal web page request, since it simply has to check if the full URL is given instead of simply a path name. Given the choice (even today) many system administrators would choose port 80 as their proxy server port simply as 'port 80 is the standard port for web requests'.

There are, however, good reasons for you to choose a port other than 80.

Running both services on the same port meant that if the system administrator wanted to install a different web server package (for extra features available in the new software) they would be limited to software that could perform both as a web server and as a proxy. Similarly, if the same sysadmin found that their web server's low-end proxy module could not handle the load of their ever-expanding local client base, they would be restricted to a proxy server that could function as a web server. The only other alternative is to re-configure all the clients, which normally involves spending a few days apologizing to users and helping them through the steps involved in changing over.

Microsoft use the Microsoft web server (IIS) as a basis for their proxy server component, and Microsoft proxy thus only (? tried once - let's see if it's changed since ?) accepts incoming proxy request on port 80. If you are installing a Squid system to replace either CERN, Apache or IIS running in both web-server and cache-server modes on the same port, you will have to set http_port to 80. Squid is written only as a high-performance proxy server, so there is no way for it to function as a web server, since Squid has no support for reading files from a local disk, running CGI scripts and so forth. There is, however, a workaround.

If you have both services running on the same port, and you cannot change your client PC's, do not despair. Squid can accept requests in web-server format and forward them to another server. If you have only one machine, and you can get your web server software to accept incoming requests on a non-default port (for example 81), Squid can be configured to forward incoming web requests to that port. This is called accelerator mode (since it's initial purpose was to speed up very slow web servers). Squid effectively does some translation on the original request, and then simply acts as if the request were a proxy request and connects to the host: the fact that it's not a remote host is irrelevant. Accelerator mode is discussed in more detail in chapter 9. Until then, get Squid installed and running on another port, and work your way through the first couple of chapters of this book, until you have a working pilot-phase system. Once Squid is stable and tested you can move on to changing web server settings. If you feel adventurous, however, you can skip there shortly!

Where to Store Cached Data

Cached Data has to be kept somewhere. In the section on hardware sizing, we discussed the size and number of drives to use for caching. Squid cannot autodetect where to store this data, though, so you need to let Squid know which directories it can use for data storage.

The cache_dir operator in the squid.conf file is used to configure specific storage areas. If you use more than one disk for cached data, you may need more than one mount point (for example /usr/local/squid/cache1 for the first disk, /usr/local/squid/cache2 for the second). Squid allows you to have more than one cache_dir option in your config file.

Let's consider only one cache_dir entry in the meantime. Here I am using the default values from the standard squid.conf.

cache_dir /usr/local/squid/cache/ 100 16 256 

The first option to the cache_dir tag sets the directory where data will be stored. The prefix value simply has /cache/ tagged onto the end and it's used as the default directory. This directory is also made by the make install command that we used earlier.

The next option to cache_dir is straight forward: it's a size value. Squid will store up to that amount of data in that directory. The value is in megabytes, so of the cache store. The default is 100 megabytes.

The other two options are more complex: they set the number of subdirectories (first and second tier) to create in this directory. Squid makes lots of directories and stores a few files in each of them in an attempt to speed up disk access (finding the correct entry in a directory with one million files in it is not efficient: it's better to split the files up into lots of smaller sets of files... don't worry too much about this for the moment). I suggest that you use the default values for these options in the mean time: if you have a very large cache store you may want to increase these values, but this is covered in the section on