Reading from Cassandra official website: https://www.datastax.com/sites/default/files/content/whitepaper/files/2019-10/CM2019236 - Data Modeling in Apache Cassandra ™ White Paper-4.pdf
Cassandra is a exemplary implmentation of NoSQL database, and gained popularity in various web, big data, and ML applications. Recently I’ve stumbled upon a good summary of Cassandra handbook, which includes a decent introduction to its data modeling techniques, which can in term be used in other NoSQL databases.
Here are my notes and summaries:
Data Modeling Concepts
There are great many ways Cassandra and traditional RDBMS are different: Cassandra is a wide-column database, with BASE eventual consistency guarantees, has looser relationships between tables. Therefore one needs to model their data very differently than traditional RDBMS for the application to run efficiently.
Namely NoSQL has following differences:
- No Joins: tables have loose relationships with each other without database level joining.
- No Referential Integrity: RDBMS requires foreign keys to refer to primary key in another table. NoSQL doesn’t enforce this.
- Denormalization: contrary to what RDBMS normalization techniques, denormalization is first-class citizen in NoSQL. Many NoSQL databases supports aggregating fields in the same table to achieve row level atomicity.
- Query First: SQL data modeling starts with entities and relations, while NoSQL data modeling starts with application queries.
- Sorting: Sorting is an important design decision, for Cassandra and many NoSQL databases.