Introduction
This article walks through implementing the Raft consensus algorithm from scratch. We'll build a working implementation with leader election, log replication, and safety guarantees. This is based on the Raft paper by Ongaro and Ousterhout and my experience implementing it as part of CMU's distributed systems coursework.
The Basic Structure
Every Raft node maintains several pieces of persistent state:
class RaftNode:
def __init__(self, node_id, peers):
# Persistent state
self.current_term = 0
self.voted_for = None
self.log = [] # list of (term, command)
# Volatile state
self.commit_index = 0
self.last_applied = 0
self.state = "follower" # follower | candidate | leader
# Leader-only volatile state
self.next_index = {} # peer -> next log index to send
self.match_index = {} # peer -> highest replicated index
self.node_id = node_id
self.peers = peers
self.election_timer = self.reset_election_timer()Implementing Leader Election
The election mechanism is driven by a randomized timer. When the timer fires, the node transitions from follower to candidate:
def start_election(self):
self.state = "candidate"
self.current_term += 1
self.voted_for = self.node_id
votes_received = 1 # vote for self
for peer in self.peers:
response = self.send_request_vote(
peer,
term=self.current_term,
candidate_id=self.node_id,
last_log_index=len(self.log) - 1,
last_log_term=self.log[-1].term if self.log else 0,
)
if response.vote_granted:
votes_received += 1
if votes_received > (len(self.peers) + 1) / 2:
self.become_leader()The RequestVote handler on each node checks two things: is the candidate's term at least as large as the receiver's? And is the candidate's log at least as up-to-date? A node grants its vote only if both conditions are met and it hasn't already voted in this term.
Implementing Log Replication
Once a leader is elected, it begins sending AppendEntries RPCs to all followers. These serve as both log replication and heartbeats:
def replicate_log(self):
for peer in self.peers:
prev_index = self.next_index[peer] - 1
entries = self.log[self.next_index[peer]:]
response = self.send_append_entries(
peer,
term=self.current_term,
leader_id=self.node_id,
prev_log_index=prev_index,
prev_log_term=self.log[prev_index].term if prev_index >= 0 else 0,
entries=entries,
leader_commit=self.commit_index,
)
if response.success:
self.next_index[peer] = len(self.log)
self.match_index[peer] = len(self.log) - 1
else:
self.next_index[peer] -= 1 # back up and retryEnsuring Safety
Raft's safety property guarantees that if a log entry is committed (applied to a state machine), no other server will ever apply a different value at that same index. This is ensured by two mechanisms:
- Election restriction: A candidate must have all committed entries in its log to win an election. The RequestVote RPC enforces this by comparing log lengths and terms.
- Commit rule: A leader only commits entries from its current term by counting replicas. Entries from previous terms are committed indirectly when a current-term entry is committed.
def update_commit_index(self):
# Find the highest index replicated on a majority
for n in range(len(self.log) - 1, self.commit_index, -1):
if self.log[n].term == self.current_term:
replicas = 1 # count self
for peer in self.peers:
if self.match_index[peer] >= n:
replicas += 1
if replicas > (len(self.peers) + 1) / 2:
self.commit_index = n
breakConclusion
Implementing Raft from scratch is one of the best ways to deeply understand consensus in distributed systems. The three-part decomposition — leader election, log replication, and safety — makes each piece manageable on its own. While a production implementation would need persistence, snapshotting, and membership changes, this foundation captures the core algorithm. The code patterns here map directly to the Raft paper's Figure 2, making it straightforward to verify correctness.