5 1 0:Source Side Deduplication

From SEPsesam


Welcome to the latest SEP sesam documentation version 5.1.0 Apollon. For previous documentation version(s), check documentation archive.


Overview


SEP sesam Si3 applies deduplication at the block level. In this deduplication technique, data is divided into blocks, which are then checked and duplicates are skipped. Only unique blocks are sent to storage. By eliminating redundant blocks, the size of the backed up data is reduced as no duplicate data is backed up. Storing the identical data only once results in reduced storage space requirements and network load as no duplicates are transferred over the network.

To enable the best possible scenarios for efficient data backup in different environments, SEP sesam offers a hybrid of both:

Both methods use a configured Si3 deduplication data store that requires a special licence. See Licensing for details.

Deduplication store types

Deprecated Si3 V1 deduplication store
As of SEP sesam v. 5.0.0 Jaglion, two Si3 deduplication store types are available. It is strongly recommended to use the new type SEP Si3 deduplication store as the old generation Si3 V1 deduplication store is deprecated. This means that the old generation Si3 V1 is no longer being enhanced, but is still supported until further notice.
Use the new Si3 deduplication store if the data is to be stored to S3 Cloud
  • If you are using an old generation Si3 V1 deduplication store with S3, you cannot restore from S3 using the GUI! See Enable Si3 setup on the same host to learn how to configure a new Si3 and an old Si3 V1 on the same backup server or RDS to make the upgrade from Si3 V1 to Si3 smoother.
Advantages of the new generation Si3 data store
Si3 is advantageous over the old Si3 V1 store type as it offers better performance and resource savings. You can back up your data directly to S3 cloud storage and Azure storage and restore the items you want directly from there. It also provides a new immutable storage feature – SiS. For more details, see Configuring Si3 Deduplication Store.

Note that the instructions for source-side deduplication are the same for both types of deduplication store. Si3 is therefore not explicitly mentioned, but the term Si3 store is used for both types of deduplication store.

What is Si3 source deduplication (Si3S)

Si3 source deduplication means that data is deduplicated before it is sent over the network, making the backup extremely bandwidth efficient. During the backup, SEP sesam calculates the hash values of the data to be backed up on the client and queries the storage to determine whether the hash value of the block is already stored there. If it is, SEP sesam sends only the hash value; if not, it sends only changed or unknown blocks of the target Si3 dedup store to the backup server.

The advantage of Si3S deduplication is that only new or changed data is transferred to the backup server during the backup. This optimises bandwidth usage and requires less storage capacity. It can be used to minimize the data transferred during backup in situations where bandwidth is a problem and SEP sesam RDS cannot be used. See Deduplication for more details on recommended utilization of dedupe methods.

Not all data is suitable for deduplication: encrypted files, disk blocks with a non-standard size, etc. cannot be deduplicated. See Data Deduplication Use Cases for more information.

Note
Using source-side deduplication does not necessarily mean that the backup windows will be reduced. This actually depends on your data structure – note that hashing chunks of data is very CPU intensive and such backups might take even longer. You should consider which clients can be overloaded in this way. In general, source-based deduplication can be an excellent solution for environments with a low daily data change rate and low bandwidth between the backup server and the backed up client.

Key features

Source-side deduplication is easy to configure and has the following advantages:

  • Only new and unique data is backed up directly at the source.
  • As less data is sent over the network, bandwidth is reduced.
  • Reduced amount of required data storage.

Source-side deduplication can have the following disadvantages:

  • The backup client can become overloaded and the backup window lengthens
  • When used for virtual data centers where resources are shared between virtual machines, it can affect production workloads.

See Data Deduplication Use Cases for more information.

Prerequisites

Make sure that the following conditions are met before using deduplication:

  • Check that the required license is installed.
  • Si3S is supported on all available Linux (additional RDS required) and Windows operating systems. Si3S is already part of a SEP sesam Windows client package, but is not included in the Linux client package. To use it on Linux, you need to install SEP sesam RDS/Server to the Linux backup client. For details on the supported OS, see SEP sesam OS and Database Support Matrix.
  • At least one Si3 deduplication store has to be configured on either a SEP sesam Server or SEP sesam Remote Device Server. For setup details, see Configuring Si3 Deduplication Store.
  • Si3S increases the CPU overhead in the production environment to calculate hashes. The minimum requirements for the system which is going to be backed are:
    • Minimum of 2 CPU cores
    • 2 GB RAM
Limitations
  • If source-side deduplication is set up for a group backup, it will be performed on the clients with the supported version. If source-side deduplication is not supported, a regular backup is started instead.
  • Source-side deduplication will not work if the STPD service TCP port on the client side (in sm.ini and/or stpd.ini) is changed from the default port. Make sure you use the default STPD TCP port on the client side to be able to perform Si3S backups.
Note
In v. ≥ 5.0.0 Jaglion, you can avoid this issue by setting the STPD service TCP port on the client (client properties -> Options tab -> Listen port) to the new TCP port.

Configuring source-side deduplication

Configuring Si3S consists of 3 main steps:

  1. Creating a required backup environment with a deduplication store. Check the Si3 Deduplication Hardware Requirements and follow the step-by-step procedure as described in Configuring Si3 Deduplication Store in v. ≥ 5.0.0 Jaglion. For older, deprecated version see Configuring Si3 V1 Deduplication Store.
  2. Once the Si3 deduplication store is created, configure the media pools.
  3. Set up your backup strategy by following the standard backup procedure: First create a backup task by selecting the data to be backed up, then determine when you want to back up your data and create a backup schedule, and then create a backup event. In this step, you also activate SEP Si3 source-side deduplication (see below).
Tip
You can use the Immediate Start button to enable Si3S and start your backup immediately.

Creating a backup event with enabled Si3S

When you create a backup event, you also activate source-side deduplication.

  1. From Main Selection -> Scheduling -> Schedules, right-click the schedule for which you want to create a new event, then click New Backup Event.
  2. Under Sequence control, you can set the Priority of your backup event. For details, see Setting Event Priorities.
  3. Under Object, select the task or task group you want to link this event.
  4. Under Parameter, specify the Backup level.
  5. From the Media pool drop-down list, select the target media pool to which the data will be backed up. Note that you have to select the media pool that is combined with an Si3 deduplication store backend.
  6. Select the SEP Si3 Source Side Deduplication check box.

  7. Click OK to save the event.

Enabling and starting Si3S instantly

  1. From the menu bar, select Activities -> Immediate Start -> Backup.
  2. In the Immediate Start: Backup dialog, select a deduplication media pool as the backup target.
  3. The check box SEP Si3 Source Side Deduplication is shown: select it and click Start.

Verifying if Si3S is used

You can verify if source-side deduplication is being applied by selecting Job State -> Backups in the Main Selection window. The job state overview provides detailed information on the backup status and shows a ticked check box in the column Source Side Deduplication if source-side deduplication is being applied to a job. The Si3S status overview also provides information on the job status, deduplication rate, Si3S start and stop time, data size and throughput, assigned media pool, etc.

Tip
You can check the details of your backups online as well as start your backups immediately, restart failed backups, restore backups online and more by using Web UI. For details, see SEP sesam Web UI.

Which network port is used for backups?

The client connects to the RDS or backup server using the following destination port: 11701 + the first dedup drive. For example, if the first dedup drive is 9, the client uses port 11710. Make sure the respective port is open in the firewall on the RDS or SEP sesam Server. You may need to manually detect and open the corresponding port. The source port is chosen randomly.
For more information, see also List of ports used by SEP sesam.


See also

Si3 Deduplication Hardware RequirementsConfiguring Si3 Deduplication StoreDeduplicationReplicationList of Ports Used by SEP sesamLicensing

Copyright © SEP AG 1999-2024. All rights reserved.
Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.