Deduplication

From SEPsesam
Jump to: navigation, search

Copyright © SEP AG 1999-2017. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3. For previous documentation version(s), check documentation archive.


Overview

When similar systems are backed up to the same data storage device, there exists the potential for redundancy within the backed up data. However, a data repository only needs to store one copy of the files to be able to restore them.

Deduplication is a data compression technique that eliminates redundant blocks and thus reduces the size of the backed up data by not backing up duplicate data, which results in a reduction in required storage space. SEP sesam applies deduplication technique at block level, and offers a hybrid of both, target-based (Si3T) and source-based deduplication (Si3S) to provide the best possible backup-efficient scenarios for various environments. Both methods require a configured Si3 deduplication store, for which a special license is needed. See List of Licenses for details.

As of v. 4.4.3 Tigon, deduplication store provides Si3 encryption. For details, see Encrypting Si3 Deduplication Store.

Si3 target deduplication

Si3T is an inline, block-level data deduplication solution that writes data directly from the SEP sesam Server or Remote Device Server to the backup media. Backups are deduplicated on the fly as the data is written to the storage target. Since the data redundancies are transferred across the network unreduced and are deduplicated directly at the target, the network load is increased but the storage savings are huge.
SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 repository. Only single instances of unique data are sent to the repository while each deduplicated file is replaced with a stub file. This stub file points to the repository and is used to retrieve stored data. See Configuring an Si3 Deduplication Store for details.

Source-side deduplication

Si3S is introduced with SEP sesam version 4.4.3. Source-side deduplication means that data is deduplicated before it is sent across the network, therefore the backup is extremely bandwidth-efficient. During backup, SEP sesam calculates hashes of data to be backed up on the client and then transfers only changed or unknown blocks of the target Si3 dedup store to the backup server. See Configuring Source-side deduplication for details.

What works best?

Information sign.png Note
When choosing your deduplication method to eliminate redundant backup data, carefully analyze your existing infrastructure, network constraints and the type of data you want to protect.
  • Typically, source-side deduplication is well suited for environments with a low LAN/WAN bandwidth and less amount of data. Another typical source-side dedup case is for remote data (ROBO) backup – for protecting and storing the data created by remote and branch offices.
  • On the other hand, target deduplication might be more suitable for large data sets in a fast network, such as structured databases, that need to significantly reduce data volumes, or for data located on clients for which you do not want to increase CPU overhead.
  • You should be aware of deduplication limitations before configuring it. For example, it does not make sense to deduplicate certain data, such as media files, which cannot be actively deduplicated because the files are unique and exist in compressed media formats. Among them are MP3, MP4, JPEG, PNG, zipped files etc.
  • For different data types different deduplication policy and method should be set up.

Si3 deduplication can be used together with Si3 replication to provide backup redundancy for disaster recovery and reduce the data transferred over the network. As of v. 4.4.3. Tigon, it is possible to use initial seed for setting up new Si3 deduplication store for the purpose of replication. For details, see Seeding Si3 Deduplication Store.

SEP Tip.png Tip
You can download SEP Tachometer to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check SEP Tachometer.

What is next?

Si3 Deduplication Hardware RequirementsConfiguring an Si3 Deduplication StoreConfiguring Source-side DeduplicationReplication

See also

SEP TachometerList of LicensesSeeding Si3 Deduplication StoreEncrypting Si3 Deduplication Store