Reading Summary: 07/20/2020

Social

A Sino-American bond, forged by Chinese students, is in peril $

How Chinese-American relationship is impacting the lives of many “stuck in between.”

How social media took us from Tahrir Square to Donald Trump

The author had the foresight about the dangerous impact social media has on a society, and he was right.

He also proposes: the cure cannot be a pure technological one, it requires fixing the vulnerabilities inside economics, political, and social systems.

Technology

Testifying at the Senate about A.I.‑Selected Content on the Internet, from Stephen Wolfram

Stephen Wolfram’s testimony at the Senate, on A.I. selected content, his ideas on why algorithmic bias is dangerous, and how we can address it with proper regulations, transparency, and user choice.

He basically proposed that users should have an idea of what algorithm is feeding them data, and the capability to choose. This requires some open benchmarks on recommendation algorithms, and frameworks for users to choose.

Programming

The Rise of Embarrassingly Parallel Serverless Compute

What is serverless computing, why it is on the rise, and why is it useful for parallel data processing (data processing, CI/CD, compilation, ML, visualization, …, you name it).

NoSQL Data Modeling Techniques

A detailed guide for modeling your NoSQL data schemes.

Paper Reading: Aurora: Distributed Relational Database

The following is my overly simplified summary of paper reading.

Aurora is a geo-distributed SQL database that supports replication, high-availability, and transactions, with its distributed design around replicating the database WAL log.

References

Read More

Reading: Cassandra Data Modeling

Reading from Cassandra official website: https://www.datastax.com/sites/default/files/content/whitepaper/files/2019-10/CM2019236%20-%20Data%20Modeling%20in%20Apache%20Cassandra%20%E2%84%A2%20White%20Paper-4.pdf

Cassandra is a exemplary implmentation of NoSQL database, and gained popularity in various web, big data, and ML applications. Recently I’ve stumbled upon a good summary of Cassandra handbook, which includes a decent introduction to its data modeling techniques, which can in term be used in other NoSQL databases.

Here are my notes and summaries:

Data Modeling Concepts

There are great many ways Cassandra and traditional RDBMS are different: Cassandra is a wide-column database, with BASE eventual consistency guarantees, has looser relationships between tables. Therefore one needs to model their data very differently than traditional RDBMS for the application to run efficiently.

Namely NoSQL has following differences:

  • No Joins: tables have loose relationships with each other without database level joining.
  • No Referential Integrity: RDBMS requires foreign keys to refer to primary key in another table. NoSQL doesn’t enforce this.
  • Denormalization: contrary to what RDBMS normalization techniques, denormalization is first-class citizen in NoSQL. Many NoSQL databases supports aggregating fields in the same table to achieve row level atomicity.
  • Query First: SQL data modeling starts with entities and relations, while NoSQL data modeling starts with application queries.
  • Sorting: Sorting is an important design decision, for Cassandra and many NoSQL databases.

Read More