Replication vs. Erasure Coding
In eEKAS, Cluster Drives use Ceph’s data redundancy mechanisms to protect against hardware failures and ensure data availability. The two primary methods are Replication and Erasure Coding (EC). While both serve the same purpose—preventing data loss—they do so in very different ways, each with its own strengths, trade-offs, and ideal use cases.
Ceph Replication
How does Ceph Replication works
Replication stores multiple identical copies of each piece of data across different drives and nodes. If one copy is lost due to a drive or node failure, the system immediately serves the data from another copy.
Example configurations:
- 2× Replication – Two copies of each object are stored. Can tolerate the loss of one drive/node.
- 3× Replication – Three copies of each object are stored. Can tolerate the loss of two drives/nodes.
Advantages:
Fast recovery – No need to reconstruct data; another copy is instantly available.
Low CPU overhead – Minimal computation required.
Best performance – Particularly for workloads with high IOPS.
Trade-offs:
Higher storage usage – 3× replication uses 3 TB of raw storage for 1 TB of usable capacity.
Typical use cases:
- High-performance block storage (iSCSI, NVMe-oF)
- Virtual machine storage requiring low latency
- Frequently updated databases
Â
Erasure Coding
How does Ceph Erasure Coding works
Erasure Coding splits data into a set number of data chunks and parity chunks, storing them across multiple drives and nodes. If one or more chunks are lost, the system uses the remaining chunks and parity to reconstruct the data.
Example configurations:
- 4+2 EC – Data is split into 4 chunks plus 2 parity chunks. Can tolerate the loss of 2 drives/nodes.
- 5+2 EC – Data is split into 5 chunks plus 2 parity chunks. Can tolerate the loss of 2 drives/nodes.
- 8+3 EC – Data is split into 8 chunks plus 3 parity chunks. Can tolerate the loss of 3 drives/nodes.
Advantages:
- High storage efficiency – 5+2 EC uses 7 TB of raw storage for 5 TB of usable capacity.
- Flexible redundancy levels – Can optimize for the desired balance between efficiency and fault tolerance.
Trade-offs:
- Higher CPU and network overhead – Requires computation to encode/decode data.
- Slightly higher latency – Particularly for small write operations.
Typical use cases:
- Object storage (S3) for large, infrequently modified files
- Backup archives
- Media repositories
Â
Choosing the Right Method
| Criteria | Replication | Erasure Coding |
|---|---|---|
| Performance | Highest | Moderate (depends on EC profile) |
| Storage Efficiency | Low | High |
| Recovery Speed | Instant | Requires reconstruction |
| Best for | Databases, VMs, low-latency workloads | Object storage, archives, large datasets |
Â
Rule of thumb for Ceph
Use Replication for performance-critical workloads where speed and instant failover are more important than raw capacity efficiency.
Use Erasure Coding for large datasets where storage cost efficiency is important and access patterns are less latency-sensitive.