Paper Reading: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Link to paper:


Mesos is a cluster resource management software from UC Berkeley. Unlike many other frameworks already existed, Mesos is designed to support heterogeneous frameworks (Hadoop, MPI, etc) in the same cluster and share resources between them, by providing a thin layer that making resource offers to the framework schedulers, and delegate the scheduling decision to the frameworks themselves.

With this design, Mesos can achieve pretty good elasticity between frameworks, and letting frameworks choose their own resources results in better data locality.

Read More

Paper Reading: Understanding Real-World Concurrency Bugs in Go


A team from Penn State University and Purdue published their latest study on concurrency bugs found in Golang projects, namely large projects from Github: Docker and Kubernetes, two datacenter container systems, etcd, a distributedkey-value store system, gRPC, an RPC library, and CockroachDB and BoltDB. The authors searched commit histories of each repository to understand concurrency bug fixes for categorization and study.


  • Go’s message-passing concurrency mechanism, something Go is proud of, isn’t as easy to use as it’s generally perceived. It creates just as many bugs, if not more, than shared-memory concurrency model.
  • Shared memory synchronization is still used more in Go projects.
  • Go’s built-in race and deadlock bug detection library still cannot catch all the bugs. There’s room for more improvements.

Read More

Paper Reading: Large-scale cluster management at Google with Borg


About: Borg is Google’s large cluster workload scheduling and management system, which handles Google’s most service and batch job workloads on a cluster on scale of thousands of machines. It hides users from burdens of management of cluster, and provides high-availability features that handles failures.

The now very famous and popular open-source docker orchestration tool Kubernetes, is an open source successor to Borg, and keeps borrowing ideas from Borg (see kubernetes).

Read More

Reading-Summary 2019-01

Becoming a magician

If you want to become a ‘magician’, the ones that with intricate moves and skills to amaze the audience, you’ll need to adopt a growing mindset:

you cannot become a ‘magician’ with the same progress rate, or by simply imagining a better self: sometimes the way to changes involves a fundamental shift in how you see the world. And to achieve that you’ll need to observe fellow ‘magicians’, learn the difference, and make non-linear progresses.

Read More

Paper Reading 10-22: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure


This is a 2010 paper that presents Dapper, a tracing infrastructure from Google, to solve problems at Google scale, in its massive scale distributed systems, where a service could invoke very deep RPC calls across different nodes in the cluster, which makes tracing quite challenging.

Read More

Paper Reading 10-14: A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

This is one of the series of papers from Microsoft’s Project Catapult, which studies leveraging reconfigurable devices (FPGA, etc.) to accelerate data center, from very specific accelerating algorithms like page ranking for Bing search engine, to more sophisticated machine learning frameworks like DNN.

This is one of their early publications, which introduces the basic design and implementation of the FPGA accelerated datacenter. It covers the very fundamental details of all aspects of server design, from hardware, network topology, FPGA core design, fault-tolerant cluster management software design, workload scheduling algorithm, and etc…

Read More

Reading-Summary 2018-10-14

Posts I find interesting around the web:

Miscalleneous Posts

Augumenting Long-term Memory

A very interesting posts on augumenting long-term memory, based on Ebbinghaus’ forgetting curve theory: use flashcards to memorize everything you’ve learned, and even trivias like your friends’ birthday, etc… It uses Anki flashcard software to go through the list of stuff.

Author also reasoned about the benefits of memorizing all the details, concepts, and “everything”: the details are the building blocks of a field of knowledge, and memorizing them dramatically helps the understanding this field.

It’s a long read but a deep discussion, and I find it a joyful read.

How To Get Rich

An interesting talk from Jared Diamond, the author of Guns, Germs, and Steel. Despite the kind of misleading title, it’s an interesting take on history and the progress of human civilizations, and how competitions between civilizations influence their prosperity.

Systems Design and Distributed Systems

SoftwareArch: You are going to need it — Using Interfaces and Dependency Injection to future proof your designs

An introduction to interfaces in Golang, and how dependency injection can help you design large projects.

System Design Primer

The basic concepts of system design, web design, basic principals and distributed systems design. A collaborated effort on Github.

Distributed Periodic Scheduling with Cron

A chapter from Google’s new Site Reliability Engineering book, on how to design a distributed cron job daemon, and handle problems including fault-tolerance, repeatedly scheduled jobs, overloading the cluster, etc… The whole book is a very valuable summary of experience of automation and distributed systems design at Google, and at Google scale. Definitely will read through other chapters.

Go hits the concurrency nail right on the head

Eli Bendersky’s blog post on why Golang gracefully handles the problems of concurrency at language level, that other major languages handles rather awkwardly.

  • Use goroutine to unify the interface to coroutines and thread.
  • Use channels to enforce the ‘share memory by communicating’ pattern.

Which greatly reduces the programmer’s mental burden of design highly concurrent systems.

Getting started with Python in HPC

An introduction to learning Python in HPC, from introduction to Python language, to distributed HPC frameworks for Python.

A Whirlwind Tour of Distributed Systems

A list of concepts, papers, and interesting blog posts on distributed systems design.

Paper Reading 09-09: C++ and the Perils of Double-Checked Locking

An interesting paper on the perils of C++, design pattern and multi-threading when they’re mixed together:

C++ and the Perils of Double-Checked Locking

The DCLP(Double-Checked Locking Pattern) is often-used in singleton design pattern: you’d like to initialize a shared object for singleton pattern, you follow the steps:

  • check lock if the resource is already initialized
  • if no, lock the mutex
  • check again if the resource is locked inside the mutex-protected area.
  • and again if no, initialize the object

See C++ example:

Singleton* Singleton::instance() {
if (pInstance == 0) { // 1st check, to avoid locking every time
Lock lock;

if (pInstance == 0) { // 2nd check, a safe check to guarantee correctness
pInstance = new Singleton;

return pInstance

This pattern however, introduces subtle bugs when described in C++ with multi-threading.

The issue is with this statement:

pInstance = new Singleton;

The following steps happen:

  1. Allocate memory for the object
  2. Construct an object in the allocated memory.
  3. Assign pInstance to the allocated memory.

But C++ specification don’t enforce the steps happen in order, and compilers are therefore not constrained to reorder them for sake of optimization. As long as the observable outcome of the instructions are correct, compilers are free to place instructions in an order so that CPUs are most utilized. Consider the following case with DCLP:

  • Thread A execute the DCLP piece of code for the first time, performs the 1st check, lock the mutex, performs the 2nd check, allocates memory for Singleton object, points the pInstance to the allocated memory. But before the Singleton object is constructed, thread A is suspended or another thread is scheduled at the same time.
  • Thread B enters DCLP area, determines that pInstance is non-null, and start using the object even before it’s fully constructed, and start accessing the Singleton object.

Oops. This is a very subtle bug, and hard to detect issue when we’re trying to initialize a shared resource once.

The paper digs into details on how compiler can leverage all sorts of different optimizations to spoil you effort to correct the DCLP code, and how to actually implement it correctly with volatile keyword.

It’s a very interesting paper on algorithm, C++, and programming, It makes you stand in awe of the difficulty and intricacies of C++ and multi-threaded programming.