Multicast Cache Communication

Cache digests are in some ways a replacement for multicast cache peering. There are some advantages to cache-digests: they are handled at the Squid level (so you don't have to fiddle with kernel multicast settings and so forth), and they add significantly less latency (finding out if a cache has an object simply involves checking an in-memory bit-array, which is significantly faster than checking across the network).

First, though, let's cover some terminology. Most people are familiar with the term broadcast, where data is sent from one host to all hosts on the local network. Broadcasts are normally used to discover things, not for general inter-machine transfer: a machine will send out a broadcast ARP request to try and find the hardware address that a specific IP address belongs to. You can also send ping packets to the broadcast address, and find machines on the local network when they respond. Broadcasts only work across physical segments (or bridged/switched networks), so an ARP request doesn't go to every machine on the Internet.

A unicast packet is the complete opposite: one machine is talking to only one other machine. All TCP connections are unicast, since they can only have one destination host for each source host. UDP packets are almost always unicast too, though they can be sent to the broadcast address so that they reach every single machine in some cases.

A multicast packet is from one machine to one or more. The difference between a multicast packet and a broadcast packet is that hosts receiving multicast packets can be on different lans, and that each multicast data-stream is only transmitted between networks once, not once per machine on the remote network. Rather than each machine connecting to a video server, the multicast data is streamed per-network, and multiple machines just listen-in on the multicast data once it's on the network.

This efficient use of bandwidth is perfect for large groups of caches. If you have more than one server (for load-distribution, say), and someone wants to peer with you, they will have to configure their server to send one ICP packet to each of your caches. If Squid gets an ICP request from somewhere, it doesn't check with all of it's peers to see if they have the object. This "check with my peers" behavior only happens when an HTTP request arrives. If you have 5 caches, anyone who wants to find out if your hierarchy has an object will have to send 5 ICP requests (or treat you as a parent, so that your caches check with one another). This is a real waste of bandwidth. With a multicast network, though, the remote cache would only send one ICP request, destined for your multicast address. Routers between you would only transfer one packet (instead of 5), saving the duplication of requests. Once on your network, each machine would pick up one packet, and reply with their answer.

Multicast packets are also useful on local networks, if you have the right network cards. If you have a large group of caches on the same network, you can end up with a lot of local traffic. Each request that a cache receives prompts one ICP request to all the other local caches, swamping the local network with small packets (and their replies). A multicast packet, on the other hand, is a kind of broadcast to the machines on the local network. They will each receive a copy of the packet, although only one went out onto the wire. If you have a good ethernet card, the card will handle a fair amount of the filtering (some cards may have to be put into promiscuous mode to pick up all the packets, which can cause load on the machine: make sure that the card you buy supports hardware multicast filters). This solution is still not linearly scalable, however, since the reply packets can easily become the bottleneck by themselves.

Getting your machine ready for Multicast

The kernel's IP stack (the piece of kernel code that handles IP networking) needs to look out for multicast packets, otherwise they will be discarded (either by the network card or the lower levels of the IP stack.) Your kernel may already have multicast support, or you will have to turn it on. Doing this is, unfortunately, beyond the scope of this book, and you may have to root around for a howto guide somewhere.

Once your machine is setup to receive multicast packets, you need your machines to talk to one another. You can either join the mbone (a virtual multicast backbone), or set up an internal multicast network. Joining the mbone could be a good thing anyway, since you get access to other services. You must be sure not to use a random set of multicast IP addresses, since they may belong to someone else. You can get your own IP range from the people at the mbone.

An outgoing multicast packet has a ttl (Time To Live) value, which is used to ensure that loops are not created. Each time a packet passes through a router, the router decrements this ttl value, and the value is then checked. Once the value reaches zero, the packet is dropped. If you want multicast packets to stay on your local network, you would set the ttl value to 1. The first router to see the packet would decrement the packet, discover the ttl was zero and discard it. This value gives you a level of control on how many multicast routers will see the packet. You should set this value carefully, so that you limit packets to your local network or immediate multicast peers (larger multicast groups are seldom of any use: they generate too many responses, and when geographically dispersed, may simply add latency. You also don't want crackers picking up all your ICP requests by joining the appropriate multicast group.)

Various multicast debugging tools are available. One of the most useful is mtrace, which is effectively a traceroute program for multicast connections. This program should help you choose the right ttl value.

Querying a Multicast Cache

The cache_peer option traditionally can have two types of cache: a parent and a sibling. If you are querying a set or multicast caches, you need to use a different tag, the multicast cache type. When you send a multicast request to a cache, each of the servers in the group will send you a response packet (from their real IP address.) Squid discards unexpected ICP responses by default, and since it can't determine which ICP replies are valid automatically, you will have to add lines to the Squid config file that stop it rejecting packets from hosts in the multicast group.

In the following example, the multicast group 224.0.1.20 consists of three hosts, at IP addresses 10.11.12.1, 10.11.13.1 and 10.11.14.1. These hosts are quite close to your cache, so the ttl value is set to 5.

Example 8-9. Sending Queries to a Multicast Server

cache_peer 224.0.1.20 multicast 3128 3130 ttl=5
# these servers belong to the 224.0.1.20 multicast group
cache_peer 10.11.12.1 sibling 3128 3130 multicast-responder
cache_peer 10.11.13.1 sibling 3128 3130 multicast-responder
cache_peer 10.11.14.1 sibling 3128 3130 multicast-responder

Accepting Multicast Queries: The mcast_groups option

As a multicast server, Squid needs to listen out for the right packets. Since you can have more than one multicast group on a network, you need to configure Squid to listen to the right multicast-group (the IP that you have allocated to Squid.) The following (very simple) example is from the config of the server machine 10.11.12.1 in the example above.

Example 8-10. Listening for Multicast Queries

multicast_groups 224.0.1.20

Other Multicast Cache Options

The mcast_icp_query_timeout Option

As you may recall, Squid will wait for up to dead_peer_timeout seconds after sending out an ICP request before deciding to ignore a peer. With a multicast group, peers can leave and join at will, and it should make no difference to a client. This presents a problem for Squid: it can't wait for a number of seconds each time (what if the caches are on the same network, and responses come back in milliseconds: the waiting just adds latency.) Squid gets around this problem by sending ICP probes to the multicast address occasionally. Each host in the group responds to the probe, and Squid will know how many machines are currently in the group. When sending a real request, Squid will wait until it gets at least as many responses as were returned in the last probe: if more arrive, great. If less arrive, though, Squid will wait until the dead_peer_timeout value is reached. If there is still no reply, Squid marks that peer as down, so that all connections are not held up by one peer.

When Squid sends out a multicast query, it will wait at most mcast_icp_query_timeout seconds (it's perfectly possible that one day a peer will be on the moon: and it would probably be a bad idea to peer with that cache seriously, unless it was a parent for the Mars top-level domain.) It's unlikely that you will want to increase this value, but you may wish to drop it, so that only reasonably speedy replies are considered.