Back to articles

How to Implement Raft from Scratch

July 2024·RaftImplementationDistributed Systems

Introduction

This article walks through implementing the Raft consensus algorithm from scratch. We'll build a working implementation with leader election, log replication, and safety guarantees. This is based on the Raft paper by Ongaro and Ousterhout and my experience implementing it as part of CMU's distributed systems coursework.

The Basic Structure

Every Raft node maintains several pieces of persistent state:

class RaftNode:
    def __init__(self, node_id, peers):
        # Persistent state
        self.current_term = 0
        self.voted_for = None
        self.log = []  # list of (term, command)

        # Volatile state
        self.commit_index = 0
        self.last_applied = 0
        self.state = "follower"  # follower | candidate | leader

        # Leader-only volatile state
        self.next_index = {}   # peer -> next log index to send
        self.match_index = {}  # peer -> highest replicated index

        self.node_id = node_id
        self.peers = peers
        self.election_timer = self.reset_election_timer()

Implementing Leader Election

The election mechanism is driven by a randomized timer. When the timer fires, the node transitions from follower to candidate:

def start_election(self):
    self.state = "candidate"
    self.current_term += 1
    self.voted_for = self.node_id
    votes_received = 1  # vote for self

    for peer in self.peers:
        response = self.send_request_vote(
            peer,
            term=self.current_term,
            candidate_id=self.node_id,
            last_log_index=len(self.log) - 1,
            last_log_term=self.log[-1].term if self.log else 0,
        )
        if response.vote_granted:
            votes_received += 1

    if votes_received > (len(self.peers) + 1) / 2:
        self.become_leader()

The RequestVote handler on each node checks two things: is the candidate's term at least as large as the receiver's? And is the candidate's log at least as up-to-date? A node grants its vote only if both conditions are met and it hasn't already voted in this term.

Implementing Log Replication

Once a leader is elected, it begins sending AppendEntries RPCs to all followers. These serve as both log replication and heartbeats:

def replicate_log(self):
    for peer in self.peers:
        prev_index = self.next_index[peer] - 1
        entries = self.log[self.next_index[peer]:]

        response = self.send_append_entries(
            peer,
            term=self.current_term,
            leader_id=self.node_id,
            prev_log_index=prev_index,
            prev_log_term=self.log[prev_index].term if prev_index >= 0 else 0,
            entries=entries,
            leader_commit=self.commit_index,
        )

        if response.success:
            self.next_index[peer] = len(self.log)
            self.match_index[peer] = len(self.log) - 1
        else:
            self.next_index[peer] -= 1  # back up and retry

Ensuring Safety

Raft's safety property guarantees that if a log entry is committed (applied to a state machine), no other server will ever apply a different value at that same index. This is ensured by two mechanisms:

  • Election restriction: A candidate must have all committed entries in its log to win an election. The RequestVote RPC enforces this by comparing log lengths and terms.
  • Commit rule: A leader only commits entries from its current term by counting replicas. Entries from previous terms are committed indirectly when a current-term entry is committed.
def update_commit_index(self):
    # Find the highest index replicated on a majority
    for n in range(len(self.log) - 1, self.commit_index, -1):
        if self.log[n].term == self.current_term:
            replicas = 1  # count self
            for peer in self.peers:
                if self.match_index[peer] >= n:
                    replicas += 1
            if replicas > (len(self.peers) + 1) / 2:
                self.commit_index = n
                break

Conclusion

Implementing Raft from scratch is one of the best ways to deeply understand consensus in distributed systems. The three-part decomposition — leader election, log replication, and safety — makes each piece manageable on its own. While a production implementation would need persistence, snapshotting, and membership changes, this foundation captures the core algorithm. The code patterns here map directly to the Raft paper's Figure 2, making it straightforward to verify correctness.