Paper Reading: Large-scale cluster management at Google with Borg

Link: https://ai.google/research/pubs/pub43438

About: Borg is Google’s large cluster workload scheduling and management system, which handles Google’s most service and batch job workloads on a cluster on scale of thousands of machines. It hides users from burdens of management of cluster, and provides high-availability features that handles failures.

The now very famous and popular open-source docker orchestration tool Kubernetes, is an open source successor to Borg, and keeps borrowing ideas from Borg (see kubernetes).

Read More

Debugging An Interesting Deadlock in Golang

This week I’ve been chasing a deadlock issue in a Golang server application, which will essentially render the server unresponsive to client requests indefinitely and cannot recover in anyway without restarting. I’ve trying all ways days and nights, even ended up re-writing a small portion of the application to clean up all the locks - no luck.

Read More