Data Retention

Data retention refers to the time period data backups and archived data must be kept on the data storage. A data retention policy should use a Tiered Storage architecture, which implies the existence of a short-term and a long term-storage.

The short-term storage is used as a disk staging (cache area), its main purpose being to provide for fast restores in order to support the daily activities of the company. If a remote location exists, staging disk data can be replicated to a remote disk storage for business continuity (BC) and disaster recovery (DR) purposes.

If doing more with less is the norm and there is no remote location, data can be pushed to tape backup directly from the staging disk. The time of recovery will be increased, but for a smaller organization, using for example LTO 4 technology, the Service Level Agreement (SLA) requirements will still be met.

Data retention periods on the short-term storage vary, according to internal policies. Keeping backups for one month would be a minimum, and the period can be extended to two, even six months. Careful planning is needed to ensure the staging disk area can store that data.

The long-term storage is used mainly for archiving both compliance and non-compliance data. An example of non-compliance data would be the point in time copies of older data which can be used as a reference for research, or for establishing trends.

The long-term storage must be secure, with emphasis on capacity first, and then performance. As a result, the long-term storage is a lower cost storage. Data placed on the long-term storage must still be available, but lower performance level (when compared to the short-term storage), within acceptable limits, is expected. With the Sarbanes-Oxley and many other regulations, the long-term storage is becoming more and more important.

Now, let’s consider an example of a multiple levels data retention policy in a D2D2T architecture, no remote location.

Example of Data Retention Levels

Example of Data Retention Levels

The following levels of data retention are implemented:

  • - Short-term storage for backups will be on disk staging.
  • - Long-term backups can be pushed to tape.
  • - Archived data can be divided in two tiers, with different data retention scopes, as follows:
  • - More recent data, with higher chances of being requested and used by users, can be placed on cheaper, high capacity SATA disks.
  • - Older archive data can be pushed by the backup software to tapes in a tape library, and the long-term tape storage is thus maintained near-line. Users can access it, can search the archive, can get items back from the archive, of course, with the delay generated by the sequential search on tapes.
  • - The last data retention level refers to tapes containing very old data, which should be stored off-site.