Peer Selection

Let's say that you have only one parent cache server: the server at your ISP. In Chapter 3, we configured Squid so that the parent cache server would not be queried for internal hosts, so queries to the internal machines went direct, instead of adding needless load to your parent cache (and the line between you). Squid can use access-control lists to decide which cache to talk to, rather than just the destination domain. With access lists, you can use different caches depending on the source IP, domain, text in the URL and more. The advantages of this flexibility are not immediately obvious (even to me), but some examples are given in th remainder of this chapter. First, however, let's cover filtering by destination domain.

Selecting by Destination Domain

This tag is used to communicate with different caches depending on the domain that the request is destined for. To ensure that you don't query another cache server for your local domain, you can use the following config line:

Example 8-2. The cache_peer_domain tag

cache_peer_domain peer-cache.otherdomain.example !.mydomain.example

Selecting with Acls

Squid can also make peer selections based on the results of acl rules. The cache_peer_access line is discussed in the previous chapter. The following example could be used if you want all requests from a specific IP address range to go to a specific cache server (for accounting purposes, for example). In the following example, all requests from the 10.0.1.* range are passed to cache.domain.example, but all other requests are handled directly.

Example 8-3. Using acls to select peers

acl myNet src 10.0.0.0/255.255.255.0
acl custNet src 10.0.1.0/255.255.255.0
acl all src 0.0.0.0/0.0.0.0
cache_peer cache.domain.example parent 3128 3130
cache_peer_access cache.domain.example allow custNet
cache_peer_access cache.domain.example deny all

Querying an Adult-Site Filtering-cache for Specific URLs

Let's say that you have a separate Adult-Site cache, which filters out urls. The company that maintains the filter list charges by number of queries, so it's in your interest to bypass them for URLs that you know are fine. Their documentation says that you should set their machine up as your default parent, so you create a list of suspect words, and set the cache up to forward requests for any URL that contains one of these words to the filtering cache server. By avoiding the filtering server, you will end up missing a fairly large number of sites. At the same time, however, you don't end up filtering out valid sites that do contain suspect words in the URL.

Example 8-4. Passing suspect urls to a filtering cache

acl suspect_url url_regex "/usr/local/squid/etc/suspect-url-list"
acl all src 0.0.0.0/0.0.0.0
cache_peer filtercache.domain.example parent 3128 3130
cache_peer_access filtercache.domain.example allow suspect_url
# all other requests go direct
cache_peer_access filtercache.domain.example deny all

Filtering with Cache Hierarchies

ISPs in the outer regions quite often peer with large hierarchies in the USA, so as to avoid any extra latency in the USA. Since it's almost certainly faster to get any local data directly from the source, they configure their caches to retrieve data for their local top-level domain directly, rather than via a USA cache.

Example 8-5. Ignoring Hierarchy Caches for a Local Top-Level Domain

acl local-tld dstdomain -i \.za
cache_peer cache1.domain.example.us parent 3128 3130
cache_peer_access cache1.domain.example.us deny local-tld

The always_direct and never_direct tags

Squid checks all always_direct tags before it checks any never_direct tags. If a matching always_direct tag is found, Squid will not check the never_direct tags, but decides which cache to talk to immediately. This behavior is demonstrated by the following example; here, Squid will attempt to go the machine intranet, even though the same host is also matched by the all acl.

Example 8-6. Bypassing a parent for a local machine

cache_peer cache.otherdomain.example parent 3128 3130
acl all src 0.0.0.0/0.0.0
acl localmachines dstdomain intranet.mydomain.example
never_direct allow all
always_direct allow localmachines

Let's work through the logic that Squid uses in the above example, so that you can work out which cache Squid is going to talk to when you construct your own rules.

First, let's consider a request destined for the web server intranet.mydomain.example. Squid first works through all the always_direct lines; the request is matched by the first (and only) line. The never_direct and always_direct tags are acl-operators, which means that the first match is considered. In this illustration, the matching line instructs Squid to go directly when the acl matches, so all neighboring peers are ignored for this request. If the line used the deny keyword instead of allow, Squid would have simply skipped on to checking the never_direct lines.

Now, the second case: a request arrives for an external host. Squid works through the always_direct lines, and finds that none of them match. The never_direct lines are then checked. The all acl matches the connection, so Squid marks the connection as never to be forwarded directly to the origin server. Squid then works through it's list of peers, trying to find the cache that the request is best forwarded to (servers that have the object are more likely to get the request, as are servers that respond fast). The algorithm that Squid uses to decide which of it's peers to use is discussed shortly.

hierarchy_stoplist

Squid can be configured to avoid cache siblings when the requested URL contains specific word-lists. The hierarchy_stoplist tag normally contains words that occur when the remote page is dynamically generated, such as cgi-bin, asp or more.

neighbor_type_domain

You can blur the distinction between peers and a siblings with this tag. Let's say that you work for a very large organization, with many regions, some in different countries.

These organizations generally have their own network infrastructure: you will install a link to a local regional office, and they will run links to a core backbone. Let's assume that you work for the regional office, and you have an Internet line that your various divisions share. You also have a link to your head-office, where they have a large cache, and their own Internet link. You peer with their cache (with them set up as a sibling), and you also peer with your local ISP's server.

When you request pages from the outside world, you treat your ISP's cache server as a parent, but when you query web servers in your own domain you want the requests to go to your head-office's cache, so that any web sites within your organization are cached. By using the neighbor_type_domain option, you can specify that requests for your local domain are to be passed to your head-office's cache, but other requests are to be passed directly.

Example 8-7. Changing the Cache Type by Destination Domain

cache_peer core-cache.mydomain.example sibling 3128 3130
cache_peer cache.isp.example parent 3128 3130
neighbor_type_domain parent mydomain.example 

Other Peering Options

Various other options allow you to tune various values that effect your cache's interaction with hierarchies. These options all effect all peering caches (rather than individual machines).

miss_access

The miss_access tag is an acl-operator. This tag has already been covered in the acls chapter (Chapter 6), but is covered here again for completeness. The miss_access tag allows you to create a list of caches which are only allowed to retrieve hits from your cache. If they request an object that is missed, Squid will return an error page denying them access. If the example below is not immediately clear, please refer to Chapter 6 for more information

Example 8-8.

acl all src 0.0.0.0/0.0.0.0
acl friendly_company src 10.2.0.3/255.255.255.0
http_access allow friendly_company
icp_access allow friendly_company
# This line stops the machine 10.2.0.3 from getting hits from our
# cache
miss_access deny friendly_company
miss_access allow all

dead_peer_timeout

If a peer cache has not responded to an ICP request for dead_peer_timeout seconds, the cache will be marked as down, and the object will be retrieved from somewhere else (probably directly from the source.)

icp_hit_stale

Turning this option on can cause problems if you peer with anyone.