do you commit directly to the db? nah bro!
even i thought i know how databases work, but i was wrong until i read about the wal.
thought, this is how it works:
( user/api request ) -> ( operation (insert, update, delete) ) -> ( db (write to disk) ) -> ( respond 200 to user )
learnt, it works like this:
( user/api request ) -> ( operation (insert, update, delete) ) -> ( wal (write to buffer in-memory) ) -> ( wal (flush to disk - fsync) ) + (starts async process) -> ( respond 200 to user )
async process:
( flush dirty pages in buffer to disk (db) ) -> ( checkpoint (wal record) ) -> ( older wal are archived )
wal -> write ahead log, is an append-only log of all changes made to the db, written before applying them to the actual db files.
pros of wal setup in dbs:
- durability (acid) -> the db changes are durable, becuase the changes are written to the wal before being applied to the db.
- recovery -> the db can be recovered from the wal, if the db is corrupted or crashed.
- atomicity -> the db changes are atomic, because the changes are written to the wal before being applied to the db, so if the db crashes, the changes are rolled back.
- batching -> the writes can be batched, so the db can write to the disk more efficiently, in a single write operation.
the wals are core part of the relational dbs like postgresql, mysql, etc.
i wont deep dive into components of wal, but i will share “how wals are used in multiple usecases, one of them is -> turbopuffer.com”, a vector store built on top of object-storage, in another post.