5 0 0:Configuring Si3 NG Deduplication Store

From SEPsesam
Other languages:

Copyright © SEP AG 1999-2022. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Draft.png WORK IN PROGRESS
This page is a draft. Treat the information on this page with caution as it may be incomplete.


Docs latest icon.png Welcome to the latest SEP sesam documentation version 5.0.0 Jaglion. For previous documentation version(s), check documentation archive.


Overview

SEP sesam v. 5.0.0 Jaglion has introduced a new generation Si3 data store: Si3 NG. It offers significantly increased performance for backup, restore and migration, as well as direct backup to S3, resulting in improved performance, scaling and resource savings.

  • The new Si3 NG can detect duplicate data fragments, optimizing the recovery process.
  • When configuring deduplication, you should consider the performance factors of deduplication. These include infrastructure (storage types), network speed, storage disk set up, achievable deduplication ratio, etc. For details, see Deduplication.
  • The new immutable storage feature (introduced in Jaglion V2) is also based on Si3 NG store (set up on a dedicated Linux server). SiS is SEP Immutable Storage, based on the File Protection Service (FPS), which scans the file system and sets the immutable bit for all new objects. This means that all data stored in SiS is marked immutable at the time of storage. Even with full admin access to the SEP sesam backup server, attackers cannot delete, modify, or encrypt data stored on SiS. For details, see SEP Immutable Storage – SiS.

Seeding Si3 deduplication store is currently not supported (see the Si3 and Si3 NG comparison section below).

How to upgrade from the old Si3 to the new Si3 NG?

SEP sesam does not support a direct upgrade from the old Si3 to Si3 NG. However, to use the new Si3 NG you can:

  • Back up all data again to the newly configured Si3 NG deduplication store.
  • You can create a replication job to replicate from the Si3 to the Si3 NG store. Replication reads all data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. For details, see the section Replicating from Si3 to Si3 NG.
SEP Tip.png Tip
You can also configure a new Si3 NG and an old Si3 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup.

Deduplication types

SEP sesam provides target-based (Si3T) and source-based deduplication (Si3S). For details on the deduplication concept and recommendations, see Deduplication.

  • Both Si3T and Si3S require a configured Si3 deduplication store.
  • In general, only one Si3 or Si3 NG deduplication store can be configured on a server. There is only one exception to this rule: You can use the enable_gui_allow_multi_dedup key to configure both Si3 deduplication store types on the same backup server or RDS to perform a smooth upgrade from Si3 to Si3 NG.
  • A valid licence is required for each Si3 NG deduplication store.
  • You can also configure an Si3 NG deduplication store via a command line. For details, see Configuring and Administering Si3 Deduplication Store with CLI.

SEP sesam support for S3-compatible cloud and Blob storage

With SEP sesam Si3 NG, you can back up your data directly to the S3 cloud and to Azure Blob storage (≥ Jaglion V2). As S3 is an open API standard and AWS Simple Storage Service is a sample implementation of the standard, SEP sesam Si3 NG can also be used with other S3-compatible cloud implementations. The configuration and management of Si3 NG in an S3-compatible cloud implementation is similar to the example shown in Backup to S3 Cloud Storage and must follow the same process and rules provided for using Si3 NG with S3. For more details, see Backup to S3 Cloud Storage. For the list of supported object storage, see the support matrix.

Updating Si3 NG on S3 from 5.0.0.4 to the new version

If you use Si3 NG on S3 and update from 5.0.0.4 to the new version, the structure of the existing stores will change as the structure of Si3 NG on S3 is automatically recreated (this includes recreating the index after the renaming). Example:

  • The S3 bucket is called seps3, the Si3 NG deduplication store name is newNG. The S3 structure with version 5.0.0.4 of NG is: seps3/pages; seps3/pages-trash; seps3/objects-trash.
  • When updating to the next version of NG, the structure changes to: seps3/newNG/pages; seps3/newNG/pages-trash; seps3/newNG/objects-trash. During this renaming, the Si3 NG service is not available.

Prerequisites

  • For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements.
  • For details on the required Java version, see Java Compatibility Matrix. Si3 NG is not mandatory, so there is no dependency rule for it in the RPM/DEB packages.
  • When estimating the maximum size of a deduplication store, you have to ensure that there is enough space available for dedup trash, otherwise the deduplication store will run out of space. You should calculate the required disk space based on a representative sample of your full backup and add the additional storage space equal to approximately 50% of the representative full backup.

Required additional amount of RAM

The following table shows the required additional amount of RAM for the Si3-NG data store. The TB value corresponds to the capacity of the Si3-NG data store.

Information sign.png Note
These requirements relate solely to the need for deduplication. In addition to these requirements, the amount of memory for the operating system and other services should be taken into account.
Si3-NG data store capacity (check initial size limit) RAM
<20 TB 16 GiB
20-40 TB 32 GiB

You can use the following command (from the admin command line) to find out how much RAM is needed at what capacity of Si3 NG. Note that you need to set the sesam profile to run the command: sm_dedup_interface -T dedup2 propose jvmconfig <Si3-CAPACITY>

Required additional amount of CPU cores

The following table shows the number of CPU cores required for a Si3 NG data store. The TB value is the amount of data backed up (before deduplication)!

Backed up data (before dedup) CPU cores
10 TB 4
20 TB 4
40 TB 8

Performance tip

Applies to Windows only: SEP AG recommends using the High performance power plan to increase the performance of your backup. Note that Windows sets all computers to the Balanced power plan by default and you must manually switch to the High Performance power plan. This way, your Windows computer will use more power, but the systems with Si3 NG will always operate at the highest performance level.

  • From the Start menu, go to Control Panel -> System and Security -> Power Options and change the setting to High performance.

Restrictions

  • Si3 NG deduplication store is not supported for NSS and MooseFS volumes.
  • To avoid problems resulting from the combination of excessively large Si3 deduplication stores and inefficient hardware, the maximum initial Si3/Si3-NG deduplication store size is currently limited to 40 TB. Please contact SEP sesam support if your specific requirements are different.
  • This limitation applies to the creation of a new Si3 NG deduplication store in the GUI.
Information sign.png Note
It is recommended to run Si3 deduplication (SEP sesam Server or RDS) on the physical host. It is also possible to run it on a virtual machine. In this case, take into account that deduplication consumes a lot of server resources for reading, processing and writing the deduplicated data, as well as for some other deduplication tasks such as housekeeping and various checks. These tasks require a large amount of IO and a large amount of memory. Si3 performance can be affected by other VMs running on the same host. Therefore, if you are running Si3 on a VM, you should be aware of possible bottlenecks and shortcomings.

Configuration procedure

The SEP sesam data store is a disk based storage that allows savesets (backed-up data) to be backed up directly to configured storage locations, including S3 cloud storage and Azure. Note that configuration procedure for the latter differs from the one described below. For details, see Backup to S3 Cloud Storage and Backup to Azure Storage.

Enable Si3 NG setup on the same host

To make the upgrade from Si3 to Si3 NG smoother, you can configure a new Si3 NG and an old Si3 on the same backup server or RDS by using the enable_gui_allow_multi_dedup key.

  1. Open the global settings in the GUI: In the menu bar, click Configuration -> Defaults -> Settings.
  2. Set the key value of enable_gui_allow_multi_dedup to 1.
  3. Si3 key.jpg

Configure Si3 NG

SEP Si3 target deduplication is easy to configure and ready to use by selecting the Si3 NG deduplication data store type. Note that Si3 NG deduplication store is not supported for NSS and MooseFS volumes. For other limitations, see Restrictions.

SEP Tip.png Tip
Si3 NG store can also be used to back up your data directly to S3 cloud or Azure. In this case, the configuration is slightly different depending on the type of storage cloud. For more information, see Backup to S3 Cloud Storage and Backup to Azure Storage.
  1. In the Main selection -> Components, click Data Stores to display the data store contents frame.
  2. From the Data Stores menu, select New Data Store. A New Data Store dialog appears.
  3. Under Data store properties, enter a meaningful name for the Si3 NG deduplication store in the Name field. Entering the name also creates the name of the drive group for your Si3 deduplication store in the Create new drive group field.
  4. From the Store type drop-down list, select SEP Si3 NG Deduplication Store.
  5. Si3 NG Jaglion 01.jpg
  6. Ensure that the Create drive option is enabled under the Drive parameter properties. The predefined value for the drive is automatically entered in the Drive number field.
  7. It is recommended to also activate the option Create second drive. Without this option, SEP sesam can only assign one drive for either reading or writing, with one job on the same drive at a time. If you use the additional dedicated drive for restore, you can perform a backup on the first drive and restore your data from the second drive simultaneously. You can also add a third drive for migration. (See section Drive access mode.)
  8. The name in the Create new drive group is already created. You can change it by simply entering a new name.
  9. The predefined number of channels is already available in the Max. channels drop-down list. The number of available channels depends on your SEP sesam Server package. For details on licensing, see Licensing.
  10. From the Device server drop-down list, select the device server for your data store.
  11. In the Path field, enter the location of your data store or use the Browse button to select it. Click OK.
    If you use the Browse button, the New Data Store information window appears with predefined recommended values for the size of your Si3 NG deduplication store. Click OK to confirm the selected location and recommended size values. You can change the size of your Si3 NG deduplication store later under Size properties (see section Size properties).
  12. Si3 NG Jaglion 02.jpg

After configuring the Si3 deduplication store, configure the media pools first then set up your backup strategy. Make sure to test your newly created Si3 NG store by running a test backup on it.

Run a test backup on Si3 NG

  1. Create a new backup task: In the Main Selection -> Tasks -> By clients, select your RDS client and then click New Backup Task. Configure your backup task and save it. For details, see Creating a Backup Task.
  2. Test the backup on the newly created Si3 NG store: From the menu bar, select Activities -> Immediate start -> Backup. In the Immediate start: Backup dialog, select the previously created media pool for Si3 NG as the target media pool for the backup. Click Start and check if your backup was successful by viewing the status of your backup job in the GUI (Monitoring -> Last Backup State or Job State -> Backups) or SEP sesam Web UI – Last backup state.

Now you can create different backup tasks to apply deduplication and enable the best possible scenarios for efficient backup in different environments. For details on how to select your deduplication method, see Deduplication. For details on how to configure a backup job, see Standard Backup Procedure.

Replicating from Si3 to Si3 NG

As SEP sesam does not support a direct upgrade from the old Si3 to the new Si3 NG, you can create a replication task to replicate from Si3 to the Si3 NG store. Replication reads all data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. Once your new Si3 NG is set up, you should configure regular replication from one NG to another NG.

Configure a replication task

To configure a replication from Si3 to Si3 NG, proceed as follows.

  1. Create a replication task: In the Main selection -> Tasks -> Replication Tasks, click New Replication Task. The New Replication Task window is displayed.
  2. In the Name field, enter a name for the replication task, e.g., Si3-2-Si3NG.
  3. Enter the following information under Parameters:
    • Media pool
      • Pool: Select the name of the source media pool of the Si3 deduplication store from which the data will be replicated.
      • Drive: Select the drive number of the drive to be used to read the data.
      • Interface: Optionally, specify the network interface of the RDS to be used for data transfer.
    • Destination
      • Pool: Select the name of the target media pool you previously created for the new Si3 NG and to which the data will be replicated.
      • Drive: Select the drive number of the drive that will be used to write the data.
      • Interface: Optionally, enter the network interface of the RDS to be used for data transfer, e.g., the name of the RDS.
    • Leave the Relative backup date (From) set to -99,999 and To set to 0.
    • In the drop-down list based on, the Sesam days option is selected by default.
    • Replication task-si3ng.jpg
  4. Click Save to save your replication task.

After you have configured a replication task, start replication as follows.

Start replication

Note that any initial replication requires a large amount of CPU, network bandwidth and time to complete successfully.

Start replication manually as follows:

  1. In the GUI menu, select Activities -> Immediate start -> Replication.
  2. In the Immediate Start: Replication window, from the Task name drop-down list select the replication task you created earlier, e.g., Si3-2-Si3NG, and click Start.

To ensure that the replication is successful, check its status:

  • Via the GUI: Go to the Main Selection -> Job state -> Migrations and Replications and look for your replication task in the first column Migration/Replication.
  • Via the Web UI: Open Web UI and from the left menu select Replications. For details, see SEP sesam Web UI.

Checking the properties and modifying your Si3 NG deduplication store

You can view the properties of your Si3 deduplication by double-clicking the corresponding Si3 NG deduplication store.

Drive options

You can modify existing and set additional drive options by double-clicking the first drive. In the Drive Properties window, you can browse the path for the data store and set the access mode for data store drives.

Drive access mode
  • read/write (default): Allows to perform read operations (e.g., restore or use a drive as the source of a migration) and write operations (e.g., backup or using a drive as the target of a migration). As the write operations can occupy the drive for a while, consider using certain drives for write operations only and setting up the other drive(s) for read operations only.
  • read: Only read operations, e.g., for restore or as the source of a migration, are allowed. It is recommended to set up additional drives in read mode to allow uninterrupted processing of tasks, such as restore.
  • write: Only write operations, e.g., for backup or as the target of a migration, are allowed. The use of drives in write mode is recommended when these drives are used in combination with additional drives that are only used in read mode.

The first drive in the list has an additional OS Access tab where you can specify the credentials (user name and password) required to access the configured drive path. Use DOMAIN\USER format for domain accounts or HOST\USER for local accounts.

Si3 NG data encryption

To configure Si3 NG data encryption, you have to create a security password for deduplication:
Main selection -> Components -> click Data Stores -> select your Si3 NG deduplication store and double-click it, then double-click the first drive of your Si3 NG deduplication store.
In the Encryption password field, specify the encryption password and repeat it.

Si3 NG drive-encryption Jaglion.jpg

For details, see Encrypting Si3 NG Deduplication Store.

Si3 NG deduplication store size properties

To change data store size properties, go to Main selection -> Components -> click Data Stores -> select your Si3 NG deduplication store and double-click it. Then under Size properties specify or modify the following:

  • Capacity: Specify the size (in GiB) of the partition for backups.
  • High watermark: Specify the value (in GiB) for the high watermark (HWM). The HWM defines the upper value for the used storage space. When this value is reached, the status of a datastore changes from OK to Warning, but backups continue to be performed. Make sure that you provide enough storage space for your backed up data.
  • Si3 repair area: Specify the value (in GiB) for the Si3 repair area. The Si3 repair area (subdirectory trash) defines the space for Si3 files that were identified by a garbage collection job and are no longer used. These files are still kept in the repair area to allow for a possible repair of Si3 in case of structural problems (which may be caused by a file system error or an operating system crash). The files in the repair area are automatically removed after the specified period of time (SEP sesam default: 4 days) or when the disk usage threshold is reached. The Si3 repair function is disabled when the value is set to 0.
  • Information sign.png Note
    The Si3 repair area for managing the disk space allocated for Si3 files is available only in advanced UI mode (formerly expert GUI mode). To see the Si3 repair area field, make sure your UI mode is set to advanced. For details, see Selecting UI mode.

The Disk space usage properties are used by SEP sesam to report the following:

  • Used: Total used space (in GiB) on the partition.
  • Total: Maximum available space (in GiB) on the partition as reported by the operating system.
  • Free: Available disk space (in GiB) for SEP sesam.
  • Deduplication rate: Deduplication takes place as soon as the backup process has started. SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 NG data store. Only single instances of unique data are sent to the data store and replace each deduplicated file with a stub file. The deduplication ratio indicates the extent of data reduction achieved by Si3 deduplication, i.e. the ratio between the protected size of data and the actual physical data size stored. A ratio of 10:1 means that 10 times more data is protected than the physical capacity needed to store it. The deduplication ratio depends greatly on the deduplication method used (si3T or Si3S), the type of data, the backup level used (the deduplication ratio is higher when there are copy and full backups and when there is a larger amount of data), etc. For details, see Deduplication.

Monitoring deduplication status

You can view the status of your of your Si3 deduplication in the GUI (Si3 deduplication store properties -> Si3 State tab) or SEP sesam Web UI. The data store status overview provides detailed information about consistency, utilization, sanity status, size, disk space usage as well as related media pools, media and drives, dependencies, data size before/after deduplication, etc. Si3 NG-datastore Jaglion web status details.jpg

Information sign.png Note
If fsck (file system consistency check) detects irregularity in the Si3 file system, the affected pages and chunks are recorded in the recovery.log. The Si3 deduplication store in GUI and Web UI is marked red and the Si3 purge is no longer executed. The purge is stopped to prevent the files in the Si3 repair area to be deleted as they may be required to repair Si3 in case of problems. Once the errors are fixed and the recovery.log is empty, the Si3 NG data store is no longer marked red and the Si3 purge is working again.

Comparison of Si3 and Si3 NG

SEP sesam v. 5.0.0 Jaglion has introduced a new generation Si3 deduplication store: Si3 NG. Si3 NG offers significantly higher performance for backup, restore and migration, as well as backup to S3 cloud and backup to Azure, the new immutable storage feature SiS, resulting in improved performance, scaling, and resource savings.

Function Si3 Si3 NG
Si3 backup YesY YesY
Si3 deduplication (source-side and target-side) YesY YesY
Si3 replication: local to remote store Notea YesY Si3 to Si3 YesY Si3 to Si3 NG; Si3 NG to Si3 NG
Si3 replication: to S3 cloud YesY YesY (provides more powerful features for backing up directly to the cloud, see the next two lines)
Backup to S3 Cloud Storage YesY YesY
Backup to Azure Storage YesY YesY (as of Jaglion V2)
SiS (SEP Immutable Storage) YesY YesY (as of Jaglion V2)
Si3 restore YesY YesY
Si3 encryption YesY YesY (as of Jaglion V2)
Seeding Si3 deduplication store Noteb YesY YesY
Usage of tachometer YesY YesY
Notea

SEP sesam does not support a direct upgrade from the old Si3 to Si3 NG. However, to use the new Si3 NG you can:

  • Back up all data again to the newly configured Si3 NG deduplication store.
  • After configuring a new Si3 NG, you can also create a replication job to replicate from the Si3 to the Si3 NG store. Replication reads all the data from the source-side store on the source-side RDS and sends it to the target store using the source-side deduplication function. For details, see Replicating from Si3 to Si3 NG.
  • You can also configure a new Si3 NG and an old Si3 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup.
Noteb

The Initial Seed feature does not work in v. 5.0.0 Jaglion, but you can use it in earlier SEP sesam versions.