Inter-Cache Communication Protocols

Squid gives you the ability to share data between caches, but why should you?

Just as there is a benefit to connecting individual PC's to a network, and this network to the Internet, so there is an advantage to linking your cache to other people's networks of caches.

User Base. The following is not a complete discussion on how the size of your user base will influence your hit rate. Chapter 2 discuss this topic in more depth. In short: the larger your user base, the more objects requested, the higher the chance of an object being requested twice. To increase your hit rate, add more clients.

However, in many cases the size of your user base is finite - it's limited by the number of staff members or customers. Co-operative peering with other caches increases the size of your user base, and effectively increases your hit rate. If you peer with a large cache, you will find that a percentage of the objects your users are requesting are already available there. Many people can increase their hit rate by about 5% by peering with other caches.

Reduced Load. If you have a large network, one cache may not handle all incoming requests. Rather than having to continuously upgrade one machine, it makes sense to split the load between multiple servers. This reduces individual server load, while increasing the overall number of queries your cache system can handle.

Squid implements Inter-Cache protocols in a very efficient manner, through ICP Multicast queries, and Cache Digests, which allow for large networks of caches (hierarchies). With these features, large networks of caches add very little latency, allowing you to scale your cache infrastructure as you grow.

Disk Space. If you load balance between multiple caches, it is best to avoid duplication of data. Duplicated objects reduce the amount of objects in the overall store, which reduces your chances of a hit. Using the Cache Array Routing Protocol (CARP) or other Inter-Cache communication protocols reduces duplication.

For your cache system to be efficient and fast, not only is raw bandwidth an issue - choosing the right hardware and software is a difficult task.