Sharing Knowledge: January 2020

What to be done to cache HTTPS content?

To cache files served under HTTPS, you have to configure squid first as mentioned in https://elatov.github.io/2019/01/using-squid-to-proxy-ssl-sites/.

The steps given below are common for caching files served under HTTP or HTTPS.

Prerequisite to know whether a content is cacheable or not?

For a file to be cached, query the header of the URL. The header should contain "Expires" header. Using this squid ensures that the content is cacheable. To cache files even if they don't contain "Expires" header, refresh_pattern (http://www.squid-cache.org/Doc/config/refresh_pattern/) has to be tweaked as mentioned in (https://www.linux.com/news/speed-your-internet-access-using-squids-refresh-patterns/).

Now coming to the topic.

By default, squid caches files in memory. This means if the service gets restarted, the cached file is gone. We can find whether squid serves the file from in memory or from disk cache by observing the following statements from squid log. Squid log is located under /var/log/squid/access.log.

... TCP_MEM_HIT/200 8745 GET http://www.squid-cache.org/ - HIER_NONE/- text/html

TCP_MEM_HIT means that the content is served from in memory. If we restart the squid service and try to fetch the same content, we will get TCP_MISS for the first time. Then from the second time onwards, we will get TCP_MEM_HIT.

To cache the files in disk, we need to configure the following in squid.conf.

cache_dir ufs /var/cache/squid 10000 16 256

(http://www.squid-cache.org/Doc/config/cache_dir/)

In the above link, they have mentioned that the default value of cache_dir is "No disk cache". So, unless we configure this, in-memory cache will be used.

To brief a bit about the above cache_dir configuration.
/var/cache/squid is the location of the squid disk cache. 10000 is the amount of disk space to be used for squid caching. 16 and 256 are the number of first-level subdirectories and second-level subdirectories.

Next change the "maximum_object_size" attribute to some big number. For example,

maximum_object_size 1000 MB

We can observe from http://www.squid-cache.org/Doc/config/maximum_object_size/ that the default maximum object size is 4 MB. This means Squid by default will cache objects only if the size is less than or equal to 4 MB. If we change the setting to a big number, we can cache files if their size is less than or equal to that number.

After these configurations, if we download the content, we will get TCP_HIT from the 2nd time. The below statement is copied from /var/log/squid/access.log.

... TCP_HIT/200 104858037 GET https://speed.hetzner.de/100MB.bin - HIER_NONE/- application/octet-stream