Link: https://ai.google/research/pubs/pub36356
This is a 2010 paper that presents Dapper, a tracing infrastructure from Google, to solve problems at Google scale, in its massive scale distributed systems, where a service could invoke very deep RPC calls across different nodes in the cluster, which makes tracing quite challenging.