Expand description
Leader election recipe for distributed consensus
§Leader Election for FoundationDB
A distributed leader election recipe using FoundationDB as coordination backend. Similar to Apache Curator’s LeaderLatch for ZooKeeper, but leveraging FDB’s serializable transactions for stronger guarantees.
§When to Use This
Good use cases:
- Singleton services (only one instance should be active)
- Job schedulers (one coordinator assigns work)
- Primary/backup failover
- Exclusive access to external resources
Consider alternatives if:
- You need mutex/lock semantics for short critical sections (use FoundationDB transactions directly)
- You need fair queuing (this uses priority-based preemption)
§API Overview
The main entry point is LeaderElection. Typical usage follows this pattern:
| Step | Method | Frequency |
|---|---|---|
| 1. Setup | new | Once per process |
| 2. Initialize | initialize | Once globally (idempotent) |
| 3. Register | register_candidate | Once per process |
| 4. Election loop | run_election_cycle | Every heartbeat interval |
| 5. Shutdown | resign_leadership + unregister_candidate | On graceful exit |
For advanced use cases, lower-level methods are available:
try_claim_leadership- Attempt to become leaderrefresh_lease- Extend leadership leaseget_leader- Query current leaderis_leader- Check if this process is leader
§Key Concepts
§Ballots
Ballot numbers work like Raft’s term - a monotonically increasing counter that establishes ordering. Higher ballot always wins. Each leadership claim or lease refresh increments the ballot. This prevents split-brain scenarios after network partitions heal.
The ballot is returned in LeaderState::ballot and can be used as a
fencing token when accessing external resources.
§Leases
Leaders hold time-bounded leases configured via lease_duration.
A leader must call run_election_cycle (or refresh_lease)
before the lease expires to maintain leadership.
If a leader fails to refresh (crash, network partition), other candidates can claim leadership after the lease expires.
§Preemption
When allow_preemption is true, higher-priority candidates
can preempt lower-priority leaders. Priority is set via the priority parameter
in register_candidate. This enables graceful leadership migration
to new machines during rolling deployments or infrastructure upgrades.
§Configuration
Configure via ElectionConfig passed to initialize_with_config:
| Field | Default | Description |
|---|---|---|
lease_duration | 10s | How long leadership is valid without refresh |
heartbeat_interval | 3s | Recommended interval for calling run_election_cycle |
candidate_timeout | 15s | When to consider candidates dead |
election_enabled | true | Enable/disable elections globally |
allow_preemption | true | Allow priority-based preemption |
Rule of thumb: heartbeat_interval should be less than lease_duration / 3
to allow retries before lease expires.
§Return Types
ElectionResult- Returned byrun_election_cycle, indicates whether this process is the leader or a followerLeaderState- Information about a leader: process ID, ballot, lease expiryCandidateInfo- Information about a registered candidate
§Safety Properties
- Mutual Exclusion: At most one leader at any time (guaranteed by FDB serializable transactions)
- Liveness: A correct process eventually becomes leader
- Consistency: Ballot numbers provide total ordering of leadership changes
§Simulation Testing
This implementation is validated through FoundationDB’s deterministic simulation framework under extreme conditions including network partitions, process failures, and clock skew up to ±2 seconds.
Key invariants verified:
- No overlapping leadership (mutual exclusion)
- Ballot monotonicity (ballots never regress)
- Fencing token validity (each claim increments ballot)
See foundationdb-recipes-simulation crate for test configurations.
Structs§
- Candidate
Info - Information about a registered candidate
- Election
Config - Global configuration for the leader election system
- Leader
Election - Coordinator for distributed leader election.
- Leader
State - The core leader state - stored at a single key
Enums§
- Election
Result - Result of an election cycle
- Leader
Election Error - Leader election specific errors
Type Aliases§
- Result
- Result type for leader election operations