This is a 2010 paper that presents Dapper, a tracing infrastructure from Google, to solve problems at Google scale, in its massive scale distributed systems, where a service could invoke very deep RPC calls across different nodes in the cluster, which makes tracing quite challenging.
This is one of the series of papers from Microsoft’s Project Catapult, which studies leveraging reconfigurable devices (FPGA, etc.) to accelerate data center, from very specific accelerating algorithms like page ranking for Bing search engine, to more sophisticated machine learning frameworks like DNN.
This is one of their early publications, which introduces the basic design and implementation of the FPGA accelerated datacenter. It covers the very fundamental details of all aspects of server design, from hardware, network topology, FPGA core design, fault-tolerant cluster management software design, workload scheduling algorithm, and etc…
Posts I find interesting around the web:
A very interesting posts on augumenting long-term memory, based on Ebbinghaus’ forgetting curve theory: use flashcards to memorize everything you’ve learned, and even trivias like your friends’ birthday, etc… It uses Anki flashcard software to go through the list of stuff.
Author also reasoned about the benefits of memorizing all the details, concepts, and “everything”: the details are the building blocks of a field of knowledge, and memorizing them dramatically helps the understanding this field.
It’s a long read but a deep discussion, and I find it a joyful read.
An interesting talk from Jared Diamond, the author of Guns, Germs, and Steel. Despite the kind of misleading title, it’s an interesting take on history and the progress of human civilizations, and how competitions between civilizations influence their prosperity.
Systems Design and Distributed Systems
SoftwareArch: You are going to need it — Using Interfaces and Dependency Injection to future proof your designs
An introduction to interfaces in Golang, and how dependency injection can help you design large projects.
The basic concepts of system design, web design, basic principals and distributed systems design. A collaborated effort on Github.
A chapter from Google’s new Site Reliability Engineering book, on how to design a distributed cron job daemon, and handle problems including fault-tolerance, repeatedly scheduled jobs, overloading the cluster, etc… The whole book is a very valuable summary of experience of automation and distributed systems design at Google, and at Google scale. Definitely will read through other chapters.
Eli Bendersky’s blog post on why Golang gracefully handles the problems of concurrency at language level, that other major languages handles rather awkwardly.
- Use goroutine to unify the interface to coroutines and thread.
- Use channels to enforce the ‘share memory by communicating’ pattern.
Which greatly reduces the programmer’s mental burden of design highly concurrent systems.
An introduction to learning Python in HPC, from introduction to Python language, to distributed HPC frameworks for Python.
A list of concepts, papers, and interesting blog posts on distributed systems design.