do you commit directly to the db? nah bro!

even i thought i know how databases work, but i was wrong until i read about the wal.

thought, this is how it works:

( user/api request ) -> ( operation (insert, update, delete) ) -> ( db (write to disk) ) -> ( respond 200 to user )

learnt, it works like this:

( user/api request ) -> ( operation (insert, update, delete) ) -> ( wal (write to buffer in-memory) ) -> ( wal (flush to disk - fsync) ) + (starts async process) -> ( respond 200 to user )

async process:

( flush dirty pages in buffer to disk (db) ) -> ( checkpoint (wal record) ) -> ( older wal are archived )

wal -> write ahead log, is an append-only log of all changes made to the db, written before applying them to the actual db files.

pros of wal setup in dbs:

  • durability (acid) -> the db changes are durable, becuase the changes are written to the wal before being applied to the db.
  • recovery -> the db can be recovered from the wal, if the db is corrupted or crashed.
  • atomicity -> the db changes are atomic, because the changes are written to the wal before being applied to the db, so if the db crashes, the changes are rolled back.
  • batching -> the writes can be batched, so the db can write to the disk more efficiently, in a single write operation.

the wals are core part of the relational dbs like postgresql, mysql, etc.

i wont deep dive into components of wal, but i will share “how wals are used in multiple usecases, one of them is -> turbopuffer.com, a vector store built on top of object-storage, in another post.