Snapshot Overview

Snapshots provide a point-in-time copy of data that can be used for recovery, rollback, or testing. Rather than taking full copies, they only record what has changed since the point they were created, making them efficient in both time and storage.

In practical terms, a snapshot gives you a known good state that you can return to if something goes wrong. Whether that’s accidental deletion, a bad change, or something not behaving as expected, you are not rebuilding anything, you are simply reverting the system to an earlier state.

Snapshots are typically used as a safety net during day-to-day operations. Before making changes, applying updates, or testing new configurations, taking a snapshot provides a straightforward way to roll back if required. In environments where data changes frequently, they also allow multiple recovery points to be captured throughout the day without adding complexity.

It is important to understand that snapshots are not backups. They exist on the same storage as the data they protect, so if the underlying storage is lost or corrupted, the snapshots are lost with it. They should always form part of a wider backup and replication strategy, not replace one.

How snapshots behave depends on the underlying storage configuration. The process of creating and managing them is consistent, but the way data is handled has a direct impact on performance, storage usage, and scalability.

In a ZFS-based configuration, snapshots are built directly into the file system and use a copy-on-write model. When a snapshot is taken, existing data blocks are preserved and only changes are written going forward. This makes snapshot creation effectively instantaneous with minimal performance impact, with storage consumption increasing only as data changes.

In non-ZFS configurations such as XFS, snapshots are handled at the volume level. This also applies to storage presented over iSCSI and NVMe over Fabrics (NVMe-oF), as these are block access methods built on top of the same underlying storage. In this example, everything sits on top of Drive1 (1 TB), with shares and targets carved from that capacity.

This means space must be reserved in advance to store the differences between live data and the snapshot. As changes occur, that reserved space is consumed. If it becomes full, the snapshot is no longer valid and cannot be used for recovery.

Because of this, snapshot configuration requires a bit more thought. You need to size reservations correctly, understand how quickly your data changes, and monitor usage over time. Change rates will vary depending on workload and user activity, so this is not something you set once and forget.

This approach provides the same core capability as ZFS, but with additional considerations. If space is under-allocated or change rates are underestimated, snapshots may not be available when needed, which limits their usefulness as a recovery option.

When working with snapshots, the key is understanding how they fit into your overall data protection strategy. They are there for fast recovery and operational flexibility, not long-term retention or disaster recovery.

It is also important to consider how frequently snapshots are taken and how long they are retained. Too few and recovery options are limited. Too many, particularly in non-ZFS environments, and you begin to consume unnecessary storage or impact performance. The balance should reflect the workload and the importance of the data.

Finally, be clear on what happens during a rollback. Reverting to a snapshot returns the data to the exact state it was in at that point in time. Any changes made after that snapshot will be lost. This is expected behaviour, but it needs to be understood before carrying it out, particularly in live environments.

Snapshots provide fast recovery and flexibility, but should be used alongside a broader data protection strategy.