Designing Data-Intensive Applications
What’s this article about
These are my notes on a book called Designing Data-Intensive Applications
.
SSTables and LSM Trees(P.76)
SSTables (Sorted String Tables) are files that contain a set of arbitrary and sorted key-value pairs.
https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
SSTables and Log-Structured Merge Trees are still used in some database.
As a rule of thumb, LSM Trees are faster for writes but slower for reads than B-Tree indexes.
B-Trees
The B-Tree is the most widely used indexing structure.
It stores data in a balanced tree, like the image below. Thus, searching data can be done in O(logN).
Each B-Tree parent page has references to child pages, and the number of references that a parent page can have is called the branching factor
, which is 6 in the case above but typically several hundred.
Index
A Primary index
is an index that resolves references to a primary key that can identify each record.
A Secondary index
, on the other hand, does not necessarily reference unique keys.
In-Memory Databases
In-memory databases achieve durability by writing data to disks.
Redis has weak durability
as it writes to disks asynchronously.
The reason why in-memory databases have advantages in performance is not because they don’t need to read from disks, but because they don’t need to encode data in a form that can be written to disks.
OLTP and OLAP
OLTP
stands for online transaction processing, and OLAP
stands for online analytic processing.
The difference is not clear-cut, but here are some main characteristics of these two.
OLTP
- Main Read Pattern: Small number of records per query, fetched by key
- Main Write Pattern: Random-Access, low-latency writes from user input
- Primary used by: End user
- What data represents: Latest state of data
- Dataset Size: Gigabytes to Terabytes
OLAP
- Main Read Pattern: Aggregate over a large number of records
- Main Write Pattern: Bulk import or event stream
- Primary used by: Internal analyst
- What data represents: History of events
- Dataset Size: Terabytes to Petabytes
4. Encoding and Evaluation. Page 111
Formats for Encoding data
Encoding
, which refers to the conversion of in-memory data
to a byte sequence
, can also be called marshalling, serialization, or serialization. Decoding
, on the other hand, is the reverse process, also known as parsing, unmarshalling, or deserialization.
gRPC(compatibility)
Regarding compatibility in gRPC, if new code reads data from an old protobuf that includes new fields, the new fields are simply ignored, thus maintaining forward compatibility.
However, adding a new required field would prevent old code from reading the old protobuf, so required fields cannot be added afterward.
Message Broker and Actor model(P.137)
Message Broker
When a message is sent to a named queue or topic, the broker ensures that it is delivered to at least one consumer or subscriber to that topic or queue.
Many producers and consumers can exist on the same topic.
Producer and Consumer Pattern Ref.(Japanese) https://www.hyuki.com/dp/dpinfo.html#ProducerConsumer
English https://cloud.google.com/pubsub/docs/overview
Actor Model
Location transparency is the ability to access objects without the knowledge of their location.
Actor model is a programming model for concurrency in a single process.
Each actor typically represents one entity or client and usually has local state.
It communicates with other actors by sending and receiving asynchronous messages.
Ref. https://www.youtube.com/watch?v=ELwEdb_pD0k
Chapter 5 Replication
Failover
When the leader(master or primary)
node failed, one of the followers(read replicas, slaves, secondaries, or hot standbys)
is promoted to be a new leader, and this process is called failover
.
Failover is fraught with things that can go wrong.
Consistency with systems outside the database(github incident)
Discarding failed writes of the leader is dangerous especially when it has to coordinate with other storage systems outside the database.
There is an incident because of the leader crushing in Github.
Due to a leader crash in Github, an outdated follower became the new leader, resulting in a primary key conflict. The database employed an auto-incrementing primary key, which the new leader used despite it already being in use by the old reader. This caused inconsistencies between the data stored in Redis and the database.
Split brain
When two nodes in a distributed system both consider themselves to be the leader, it is referred to as a “split brain” scenario.
This situation can cause data corruption and loss.
Chapter 6 Partitioning(P.199)
A partition with a disproportionately high load is commonly referred to as a hot spot.
Hash partitioning is a popular technique used to distribute data evenly among partitions, but it can make range queries less efficient because data will be distributed regardless their similarity in contents.
Cassandra achieves a compromise of the above trade-off.
This can be achieved using compound primary keys, where only the first part of the key is used to determine a partition, and the other columns are used as a concatenated index for sorting the data.
-
Key range partitioning
involves sorting keys and assigning partitions to them. It has the advantage of sorting but carries a risk of creating hot spots. -
Hash partitioning
involves applying a hash function to each key. It has the advantage of evenly distributing data into partitions but can lead to inefficient range queries.