Source Side Dedup

From SEPsesam
Jump to: navigation, search
Other languages:
Deutsch • ‎English

Copyright © SEP AG 1999-2022. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3 Beefalo/5.0.0 Jaglion. For previous documentation version(s), check documentation archive.


Overview

SEP sesam Si3 applies deduplication at the block level. In this deduplication technique, data is divided into blocks, which are then checked and duplicates are skipped. Only unique blocks are sent to storage. By eliminating redundant blocks, the size of the backed up data is reduced as no duplicate data is backed up. Storing the identical data only once results in reduced storage space requirements and network load as no duplicates are transferred over the network.

SEP sesam offers a hybrid of both:

to enable the best possible scenarios for efficient data backup in different environments. Both methods use a configured Si3 deduplication data store that requires a special licence. See Licensing for details.

Deduplication store types

SEP sesam v. 5.0.0 Jaglion has introduced a new generation Si3 deduplication store: Si3 NG. It offers significantly higher performance for backup, restore and migration, as well as direct backup to S3, resulting in improved performance, scaling and resource savings. For more details, see Configuring Si3 NG Deduplication Store (and target dedupe) and Si3 NG Direct to S3.

Note that the instructions for source-side deduplication are the same for both types of deduplication store. Si3 NG is therefore not explicitly mentioned, but the term Si3 store is used for both types of deduplication store.

What is Si3 source deduplication (Si3S)

Si3 source deduplication means that data is deduplicated before it is sent over the network, making the backup extremely bandwidth efficient. During the backup, SEP sesam calculates the hash values of the data to be backed up on the client and queries the storage to determine whether the hash value of the block is already stored there. If it is, SEP sesam sends only the hash value; if not, it sends only changed or unknown blocks of the target Si3 dedup store to the backup server.

The advantage of Si3S deduplication is that only new or changed data is transferred to the backup server during the backup. This optimises bandwidth usage and requires less storage capacity. It can be used to minimize the data transferred during backup in situations where bandwidth is a problem and SEP sesam RDS cannot be used. See Deduplication for more details on recommended utilization of dedupe methods.

Not all data is suitable for deduplication: encrypted files, disk blocks with a non-standard size, etc. cannot be deduplicated. See Data Deduplication Use Cases for more information.

Information sign.png Note
Using source-side deduplication does not necessarily mean that the backup windows will be reduced. This actually depends on your data structure – note that hashing chunks of data is very CPU intensive and such backups might take even longer. You should consider which clients can be overloaded in this way. In general, source-based deduplication can be an excellent solution for environments with a low daily data change rate and low bandwidth between the backup server and the backed up client.

Key features

Source-side deduplication is easy to configure and has the following advantages:

  • Only new and unique data is backed up directly at the source.
  • As less data is sent over the network, bandwidth is reduced.
  • Reduced amount of required data storage.

Source-side deduplication can have the following disadvantages:

  • The backup client can become overloaded and the backup window lengthens
  • When used for virtual data centers where resources are shared between virtual machines, it can affect production workloads.

See Data Deduplication Use Cases for more information.

Prerequisites

Make sure that the following conditions are met before using deduplication:

  • Check that the required license is installed.
  • Si3S is supported on all available Linux (additional RDS required) and Windows operating systems. Si3S is already part of a SEP sesam Windows client package, but is not included in the Linux client package. To use it on Linux, you need to install SEP sesam RDS/Server to the Linux backup client. For details on the supported OS, see SEP sesam OS and Database Support Matrix.
  • At least one Si3 deduplication store has to be configured on either a SEP sesam Server or SEP sesam Remote Device Server. For setup details, see Configuring Si3 Deduplication Store.
  • Si3S increases the CPU overhead in the production environment to calculate hashes. The minimum requirements for the system which is going to be backed are:
    • Minimum of 2 CPU cores
    • 2 GB RAM
SEP Warning.png Limitations
  • Currently, external backup jobs such as Oracle, SAP or DB2, cannot use source-side deduplication.
  • If source-side deduplication is set up for a group backup, it will be performed on the clients with the supported version. If source-side deduplication is not supported, a regular backup is started instead.
  • Source-side deduplication will not work if the STPD service TCP port on the client side (in sm.ini and/or stpd.ini) is changed from the default port. Make sure you use the default STPD TCP port on the client side to be able to perform Si3S backups.
Information sign.png Note
In v. ≥ 5.0.0 Jaglion, you can avoid this issue by setting the STPD service TCP port on the client (client properties -> Options tab -> Listen port) to the new TCP port.

Configuring source-side deduplication

Configuring Si3S consists of 3 main steps:

  1. Creating a required backup environment with a deduplication store. Check the Si3 Deduplication Hardware Requirements and follow the step-by-step procedure as described in Configuring Si3 Deduplication Store or Configuring Si3 NG Deduplication Store in v. ≥ 5.0.0 Jaglion (depending on which deduplication store you are using).
  2. Once the Si3 deduplication store is created, configure the media pools.
  3. Set up your backup strategy by following the standard backup procedure: First create a backup task by selecting the data to be backed up, then determine when you want to back up your data by creating a backup schedule, and then create a backup event. In this step, you also activate SEP Si3 source-side deduplication (see below).
SEP Tip.png Tip
You can use the Immediate Start button to enable Si3S and start your backup immediately.

Creating a backup event with enabled Si3S

When you create a backup event, you also activate source-side deduplication.

  1. From Main Selection -> Scheduling -> Schedules, right-click the schedule for which you want to create a new event, then click New Backup Event.
  2. Under Sequence control, you can set the Priority of your backup event. For details, see Setting Event Priorities.
  3. Under Object, select the task or task group you want to link this event.
  4. Under Parameter, specify the Backup level.
  5. From the Media pool drop-down list, select the target media pool to which the data will be backed up. Note that you have to select the media pool that is combined with an Si3 deduplication store backend.
  6. Select the SEP Si3 Source Side Deduplication check box.
  7. SSDD enable Beefalo V2.jpg

  8. Click OK to save the event.

Enabling and starting Si3S instantly

  1. From the menu bar, select Activities -> Immediate Start -> Backup.
  2. In the Immediate Start: Backup dialog, select a deduplication media pool as the backup target.
  3. The check box SEP Si3 Source Side Deduplication is shown: select it and click Start.
  4. SSDD enable-immediate start Beefalo V2.jpg


Verifying if Si3S is used

You can verify if source-side deduplication is being applied by selecting Job State -> Backups in the Main Selection window. The job state overview provides detailed information on the backup status and shows a ticked check box in the column Source Side Deduplication if source-side deduplication is being applied to a job. The Si3S status overview also provides information on the job status, deduplication rate, Si3S start and stop time, data size and throughput, assigned media pool, etc. Ssdd-status Beefalo V2.jpg

SEP Tip.png Tip
You can check the details of your backups online as well as start your backups immediately, restart failed backups, restore backups online and more by using Web UI. For details, see SEP sesam Web UI.

Which network port is used for backups?

The client connects to the RDS or backup server using the following destination port: 11701 + the first dedup drive. For example, if the first dedup drive is 9, the client uses port 11710. Make sure the respective port is open in the firewall on the RDS or SEP sesam Server. You may need to manually detect and open the corresponding port. The source port is chosen randomly.
For more information, see also List of ports used by SEP sesam.