When logic and proportion
Have fallen sloppy dead
And the White Knight is talking backwards
And the Red Queen’s off with her head
Remember what the dormouse said
Feed your head
Feed your head
“Data and Goliath” is an excellent book a friend recommended. It’s a summary of all the dangerous and negative ways data, and the “Big Data” technology can shape our societies. The author Bruce Schneier is a prominent expert in cryptography who published impactful works on cryptography and issues on privacy. He’s also on the board of directors of Electronic Frontier Foundation.
- Renowned Security Expert Bruce Schneier Joins EFF Board of Directors
How to model timeseries data with Cassandra.
The best way to understand something, is to build one yourself. This tutorial covers basic network programming in Go, struct design and the usage of
A great experience sharing blog on how to debug a performance issue in their services. And with profiling and analysis tools, the Uber team was able to pinpoint this issue in worker pool and goroutine stack allocation, and then they forked the Go compiler to prove it’s a regression in the Go compiler. A very nice read and analysis process.
A programming book on topics in distributed computation, from teaching experience in distributed system course, from Northeastern University.
A very nice engineering blog from 2014. A excellent overview of Spotify culture, and an introduction on how to build the “agile” team.
NYTimes has released its in-house course to teach journalists data science. Journalism can also benefit from a little coding/data analytics skills.
If go is one of your favorite languages as well, this is a must read: it introduces all the basic tooling that comes with Go’s ecosystem, which might greatly save your time.
A thread from HackerNews, discussing the importance of formal verification for distributed systems.
TLA+ and formal verification is notoriously known for its complexity and steep learning curve. This might be one of my very future goals.
What it takes to be a software architect, a great blog post from InfoQ.
TIL that it is possible to convert your C/C++ assembly into Go’s assembly, and call from Go’s code. InfluxData leverages the tooling to embed AVX/SSE instructions into Golang’s assembly, thus boosts Go code’s performance, sometimes by orders of magnitude.
More information on this tool, c2goasm, work from Minio.
I think so, too. But it’ll require a community and proper tooling to see it really prosper. Hope to see that some day.
A great piece from Ray Dalio, the founder of investment firm Bridgewaters, a seasoned investor, discusses in his recent long post why American capitalism is sick in distributing resources, especially educational resources, and needs to be reformed to stay healthy.
Kafka is a message queue, a pub-sub system, an event sourcing tool, and a stream processing infrastructure, is a key part of many streaming distributed systems that requires streaming data. Its underlying idea, is to aggregate data from a distributed sources, to a unifying linear log structure.
The blog is from Kafka’s creator Jay Kreps when he was at LinkedIn, contemplating the log abstraction as a key part of any distributed systems. This is not Kafka’s design paper, implementation or a tutorial, but rather the process of brewing the idea that led to its birth, and I found it equally interesting. The following are my notes.
Take a look at what Mr. Gates thinks are the greatest technology breakthroughs right now. The list might surprise you.
How Netflix leverages AWS technologies to build world-scale, highly-availbile, fault-tolerant distributed video streaming system.
Lyft architecture evolution on AWS.
From Farnam Street – an interesting blog site I found recently.
Also on Farnam Street and its “mental models”: The Mental Model Fallacy. TL;DR: The so-called “mental models” from Farnam Street is not of much value when it’s from non-practitioners. And to learn businees, like basketball, swimming, etc., you’ll need to actually practice to learn the intricate knowledge that are not easily translated into writings.
Unfortunately I didn’t have time to finish reading this paper. But it’s good to learn the concept of branchless algorithms to fill the CPU pipeline and achieve amazing performance.
Link to paper: https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf
Mesos is a cluster resource management software from UC Berkeley. Unlike many other frameworks already existed, Mesos is designed to support heterogeneous frameworks (Hadoop, MPI, etc) in the same cluster and share resources between them, by providing a thin layer that making resource offers to the framework schedulers, and delegate the scheduling decision to the frameworks themselves.
With this design, Mesos can achieve pretty good elasticity between frameworks, and letting frameworks choose their own resources results in better data locality.
A team from Penn State University and Purdue published their latest study on concurrency bugs found in Golang projects, namely large projects from Github: Docker and Kubernetes, two datacenter container systems, etcd, a distributedkey-value store system, gRPC, an RPC library, and CockroachDB and BoltDB. The authors searched commit histories of each repository to understand concurrency bug fixes for categorization and study.
- Go’s message-passing concurrency mechanism, something Go is proud of, isn’t as easy to use as it’s generally perceived. It creates just as many bugs, if not more, than shared-memory concurrency model.
- Shared memory synchronization is still used more in Go projects.
- Go’s built-in race and deadlock bug detection library still cannot catch all the bugs. There’s room for more improvements.
About: Borg is Google’s large cluster workload scheduling and management system, which handles Google’s most service and batch job workloads on a cluster on scale of thousands of machines. It hides users from burdens of management of cluster, and provides high-availability features that handles failures.
The now very famous and popular open-source docker orchestration tool Kubernetes, is an open source successor to Borg, and keeps borrowing ideas from Borg (see kubernetes).