Sharing Knowledge: 2020

Tuesday, 21 July 2020

Linux file system goes to read only mode

When I was working on a SLES 12 sp4 machine(/etc/SuSE-release file has the OS information), I want to edit some service file to change it to run in debug mode. But I couldn't edit that file as the editors like vi/gedit were complaining that the file system is read only. I have edited these files successfully in the past. I was wondering why I couldn't do this now.

I was able to find the fix from one stackoverflow post.

https://askubuntu.com/questions/197459/how-to-fix-sudo-unable-to-open-read-only-file-system

I ran "sudo fsck.ext4 -f ..." and gave yes to all the questions. Then a reboot was asked. Once the machine got rebooted, I was able to successfully edit the file.

To know the file system type (ext3/ext4/...), you can use "df -T" command. As I was editing one file which was present in /usr/lib/systemd, I had to look for the file system type of / in my case.

Wednesday, 19 February 2020

RateLimiter from Guava library - My understanding

To get an understanding of Guava library's RateLimiter, I was going through this article https://guava.dev/releases/22.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html

After looking at the above piece of code, I had a doubt of how the acquire method works as I was wondering what will happen if the packet length is more than 5k. I wrote this program.

The output is this:

2020-02-19T10:43:40.249Z - Time before our actual logic gets executed

2020-02-19T10:43:40.446Z - Time after first acquire(5)

2020-02-19T10:43:45.448Z - Time after acquire(3)

2020-02-19T10:43:46.946Z - Time after acquire(1)

2020-02-19T10:43:47.447Z - Time after 2nd acquire(1)

2020-02-19T10:43:47.946Z - Time after 3rd acquire(1)

As mentioned in the https://guava.dev/releases/22.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html, if an expensive task arrives at an idle RateLimiter, it will be granted immediately, but it is the next request that will experience extra throttling thus paying for the cost of the expensive task.

There is no waiting at the acquire(10). But the next acquire had to wait for 5 seconds. The reason for 5 seconds is because we have created the RateLimiter with "2 permits per second". So, 5 seconds is required to generate 10 permits. After 5 seconds are done, the next acquire(3) is served immediately. But the next acquire had to wait for 1.5 seconds.

After waiting for 1.5 seconds as mentioned above, the next acquire(1) is executed immediately. But subsequent each acquire(1) has to wait for 0.5 seconds because RateLimiter ensures 2 permits per second which mean 1 permit per 0.5 seconds.

Coming back to the initial example mentioned in the guava official link, if the packet length is 10k, the first call will return immediately but guava library will maintain a clock which will ensure that the next acquire waits for 2 seconds (10k / 5k permits per second = 2 seconds)

(Copied from official doc: It is important to note that the number of permits requested never affects the throttling of the request itself (an invocation to acquire(1) and an invocation to acquire(1000) will result in exactly the same throttling, if any), but it affects the throttling of the next request. I.e., if an expensive task arrives at an idle RateLimiter, it will be granted immediately, but it is the next request that will experience extra throttling, thus paying for the cost of the expensive task.)

RateLimiter uses Token Bucket algorithm which is explained in https://dzone.com/articles/detailed-explanation-of-guava-ratelimiters-throttl

References

1) https://guava.dev/releases/22.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html

2) https://dzone.com/articles/detailed-explanation-of-guava-ratelimiters-throttl

3) https://www.nurkiewicz.com/2015/07/writing-download-server-part-v-throttle.html

4) https://www.baeldung.com/guava-rate-limiter

Wednesday, 12 February 2020

Git - Fixing accidental commit to master

I frequently run into a problem where I commit something to master by mistake which should be in some brand new branch. How to fix this?

I found the below commands from https://ohshitgit.com/#accidental-commit-master which helped me.

# create a new branch from the current state of master
git branch some-new-branch-name
# remove the last commit from the master branch
git reset HEAD~ --hard
git checkout some-new-branch-name
# your commit lives in this branch now :)

Tuesday, 7 January 2020

How to configure squid to cache big files

What to be done to cache HTTPS content?

To cache files served under HTTPS, you have to configure squid first as mentioned in https://elatov.github.io/2019/01/using-squid-to-proxy-ssl-sites/.

The steps given below are common for caching files served under HTTP or HTTPS.

Prerequisite to know whether a content is cacheable or not?

For a file to be cached, query the header of the URL. The header should contain "Expires" header. Using this squid ensures that the content is cacheable. To cache files even if they don't contain "Expires" header, refresh_pattern (http://www.squid-cache.org/Doc/config/refresh_pattern/) has to be tweaked as mentioned in (https://www.linux.com/news/speed-your-internet-access-using-squids-refresh-patterns/).

Now coming to the topic.

By default, squid caches files in memory. This means if the service gets restarted, the cached file is gone. We can find whether squid serves the file from in memory or from disk cache by observing the following statements from squid log. Squid log is located under /var/log/squid/access.log.

... TCP_MEM_HIT/200 8745 GET http://www.squid-cache.org/ - HIER_NONE/- text/html

TCP_MEM_HIT means that the content is served from in memory. If we restart the squid service and try to fetch the same content, we will get TCP_MISS for the first time. Then from the second time onwards, we will get TCP_MEM_HIT.

To cache the files in disk, we need to configure the following in squid.conf.

cache_dir ufs /var/cache/squid 10000 16 256

(http://www.squid-cache.org/Doc/config/cache_dir/)

In the above link, they have mentioned that the default value of cache_dir is "No disk cache". So, unless we configure this, in-memory cache will be used.

To brief a bit about the above cache_dir configuration.
/var/cache/squid is the location of the squid disk cache. 10000 is the amount of disk space to be used for squid caching. 16 and 256 are the number of first-level subdirectories and second-level subdirectories.

Next change the "maximum_object_size" attribute to some big number. For example,

maximum_object_size 1000 MB

We can observe from http://www.squid-cache.org/Doc/config/maximum_object_size/ that the default maximum object size is 4 MB. This means Squid by default will cache objects only if the size is less than or equal to 4 MB. If we change the setting to a big number, we can cache files if their size is less than or equal to that number.

After these configurations, if we download the content, we will get TCP_HIT from the 2nd time. The below statement is copied from /var/log/squid/access.log.

... TCP_HIT/200 104858037 GET https://speed.hetzner.de/100MB.bin - HIER_NONE/- application/octet-stream