The Transparent Caching Process

Let's look at what happens when you use transparency. First, though, you need to know something of what happens to IP packets at the ethernet level.

Some Routing Basics

An ethernet IP packet contains four addresses:

When a host wants to communicate with a machine that isn't on the local network, it uses a smart router to find the path to that network. When the client wants to send a packet through a router, the client sets the destination mac address of the packet to the router's interface, and sets the IP destination address to the required end host. It's important to know that the destination IP address of the packet isn't set to the router's IP address, only the mac address is changed. When a router accepts a packet, it decides which host to forward it to, based on it's routing tables. The router then sets the destination mac address of the packet to the next-hop router's ethernet address, and sends the packet to that machine. The remote host then repeats this process: if it's the destination machine, it uses the packet, but if it's another router, it will try and move the packet closer to it's final destination.

Packet Flow with Transparent Caches

Transparent caches essentially look out for TCP connections destined for port 80. The cache server will intercept these packets, convert them to a standard TCP stream and pass them to Squid. When Squid sends reply data to the client, the Operating System fakes the source address of the packets, so that the client believes it is connected to the server that it originally sent the request to.

You can't simply plug a transparent cache into the network and get it to transparently cache pages. The cache server needs to be in a position where it can fake the reply packets (without the real server interrupting the conversation and confusing things.) The server needs to be the gateway to the outside world.

Let's look at the simplest transparent cache setup. The client machine (10.0.0.50) treats the cache server's internal (10.0.0.1) interface as it's default gateway. This way, all packets arrive on the cache server before they reach the rest of the Internet. The filter looks for port 80 packets, and passes them to Squid, but allows all other packets to be passed to the routing layer, which passes the packets to the router's IP (172.31.0.2).

Once the connection is established, Squid needs to communicate with the client. Squid doesn't do any strange packet assembly: that's left to the transparency layer. When Squid sends reply data to the client, the kernel automatically changes the packet's from address, so it appears to the client that the server is just routing the requests from the outside world. When Squid connects to the remote server, however, the connect comes from the external interface of the cache server (172.31.0.1, in the example.) This is where IP-authentication breaks: since the request is coming from the cache (rather than the client's real address (10.0.0.50).

Effectively, we need to get four things right to get transparency right: