Source Side Dedup

From SEPsesam
Jump to: navigation, search
Other languages:
Deutsch • ‎English

Copyright © SEP AG 1999-2017. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3. For previous documentation version(s), check documentation archive.


Information sign.png Note
The components used are still in developmental stage! To get the required components, send a request to support@sep.de.

Overview

SEP sesam applies deduplication technique at block level, and offers a hybrid of both, target-based (Si3T) and source-based deduplication (Si3S, introduced in SEP sesam v. 4.4.3). Both methods require a configured Si3 deduplication store, for which a special license is needed.

Source-side deduplication means that during backup only changed blocks are transferred to the backup server. On the client itself the backup process calculates hashes of data to be backed up and only changed or unknown blocks of the target Si3 deduplication store are sent to the backup server. It can be used to minimize the data transferred during backup in situations where bandwith is a problem and SEP sesam RDS cannot be used. See Deduplication for more details on recommended utilization of dedupe methods.

Information sign.png Note
Using source-side deduplication does not necessarily mean that the backup windows will be reduced. This actually depends on your data structure – note that hashing chunks of data is very CPU intensive and such backups might take even longer. You should consider which clients can be overloaded in this way. Typically, source-based deduplication is a great solution for environments with a low daily data change rate and low bandwidth between the backup server and backed up client.


Key features

Source-side deduplication is easily configured and has the following advantages:

  • Only new and unique data is being backed up directly at the source.
  • Because less data is sent over the network the bandwidth is reduced.
  • Reduced amount of required data storage.

Source-side deduplication may have the following disadvantages:

  • Backup client might get overloaded and the backup window is lengthened.
  • If it is used for virtual data centers where resources are shared among the virtual machines, it can impact production workloads.

Prerequisites

Make sure that the following conditions are met before using deduplication:

  • Check that the required license is installed.
  • Si3S is supported on all available Linux and Windows operating systems. However, on Linux clients it is only available if SEP sesam RDS is installed on the backup client. (Si3S is a part of a SEP sesam Windows client package, but is not included in the Linux client package, only in Linux Server or RDS package.) For details on supported OS, see SEP sesam OS and Database Support Matrix.
  • At least one Si3 deduplication store has to be configured on either a SEP Sesam Server or SEP Sesam Remote Device Server. For details on how to set it up, see Configuring an Si3 Deduplication Store.
  • Si3S increases CPU overhead in the production environment to calculate hashes. Therefore the minimum requirements for the system which is going to be backed are:
    • Minimum of 2 CPU cores
    • 2 GB RAM
Limitations
  • Currently external backup jobs such as Oracle, SAP or DB2 cannot use source-side deduplication.
  • If source-side deduplication is set up for a group backup, it will perform a source-side dedupe on the clients with the supported version. If source-side dedupe is not supported, a regular backup is started instead.

Configuring source-side deduplication

Configuring Si3S consists of 3 main steps:

  1. Creating a required backup environment with a deduplication store. Check the Si3 Deduplication Hardware Requirements and follow the step-by-step procedure as described in Configuring an Si3 Deduplication Store.
  2. Once the Si3 deduplication store is created, configure the media pools.
  3. Set up your backup strategy by following the standard backup procedure: First, you will create a backup task by selecting the data to be backed up, then you will specify when you want to back up your data by creating a backup schedule, and then you will create a backup event. In this step you will also enable the source-side deduplication (see below).
SEP Tip.png Tip
You can also use the Immediate start button to enable the Si3S and start your backup instantly.

Creating a backup event with enabled Si3S

When creating a backup event, you can also enable source-side deduplication.

  1. From Main Selection -> Scheduling -> Schedules, right-click the schedule for which you want to create a new event then click New backup event.
  2. Under Sequence control, you can set up the Priority of your backup event. For details, see Setting Event Priorities.
  3. Under Parameter, specify the Backup type.
  4. From the Media pool drop-down list, select the target media pool to which the data will be backed up. Note that you have to select the media pool which is combined with an Si3 deduplication store backend.
  5. Select the check box Source Side Deduplication.
  6. SSDD enable.png


  7. Click OK to save the event.

Enabling and starting Si3S instantly

  1. From the menu bar, select Activities -> Immediate start -> Backup.
  2. In the Immediate start: Backup dialog, select a deduplication media pool as your backup target.
  3. The check box Source Side Deduplication is shown: select it and click Start.
  4. SSDD enable-immediate start.png

Verifying if Si3S is used

You can verify if source-side deduplication is successful by selecting Job State -> Backup in the Main selection window. The job state overview provides detailed information on backup status and shows source-side deduplication tasks in the first column. In our example (see below), the source-side deuplication task name is SSDD. The Si3S status overview also provides information on the job status, deduplication ratio, start and stop time of the Si3S, data size and throughput, assigned media pool, etc. Ssdd-status.jpg

What network port is used for backup?

The client will connect the RDS or backup server on the following destination port: 11701 + the first dedup drive. For example, when the first dedup drive is 9, client will use the port 11710. Make sure that the respective port is opened in the firewall on RDS or SEP sesam Server. You may need to manually detect and open the relevant port. The source port will be random.

See also

Si3 Deduplication Hardware RequirementsConfiguring an Si3 Deduplication StoreDeduplicationReplication