Configuring an Si3 Deduplication Store
Overview
SEP sesam provides a target-based (Si3T) and source-based deduplication (Si3S). For details on deduplication concept and recommendations, see Deduplication.
Both, Si3T and Si3S require a configured Si3 deduplication store. Only one Si3 deduplication store can be configured on a server. A valid licence is required for each Si3 deduplication store.
You can download SEP Tachometer to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check SEP Tachometer.
Prerequisites
For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements. Keep in mind that these requirements represent the demand for deduplication only. In addition, the amount of memory for the operating system and other services should be taken into account.
In addition, the following prerequisites must be met to configure an Si3 deduplication store.
Additional RAM / CPU requirements
Si3 Deduplication Hardware Requirements/en
Configuring using GUI
The SEP sesam data store is a disk based storage that enables save sets (backed up data) to be backed up directly to the configured storage locations. SEP Si3 target deduplication is easily configured and ready to use by selecting Si3 deduplication data store type.
- Steps
- In the Main selection -> Components, click Data stores to display the data store contents frame.
- From the Data stores menu, select New data store. A New data store dialog appears.
- Under the Data store properties in the Name field, enter a meaningful name for the data store.
- From the Store type drop-down list, select SEP Si3 deduplication store. Skip the Messages section, which is used by SEP sesam to display the last executed action.
- Make sure that the option Create drive for data store is checked under the Drive parameter properties. The predefined value for the drive is automatically added to the Drive number field.
- From the Device server drop-down list, select the device server for your data store.
- In the Path field, enter the location for your data store or use the Browse button to select the relevant folder. Check the relevant folder and click OK.
When using the Browse button to select the folder, the New data store information window appears with predefined recommended values for your Si3 deduplication store size. Click OK to confirm the selected location and the recommended size values. You can modify your data store size later under the Size properties (see step 10). - Under the Drive group properties, select Create new drive group and enter the name for your Si3 deduplication store dedicated group.
- The predefined number of channels is already displayed in the Max. channels drop-down list. The number of available channels depends on your SEP sesam Server package. The standard license supports 5 concurrent streams, enabling 5 backup processes to run simultaneously. For details on licensing, see SEP sesam license.
- Under the Size properties, specify or modify the following:
- Capacity: Specify the size (in GB) of the partition for backups.
- High watermark: Specify the value (in GB) for the high watermark (HWM). The HWM defines the upper value for used disk space. When this value is reached, a purge process is triggered for all EOL-free (End-of-lifetime) save sets, thus freeing up the capacity of the data store.
- Low watermark: Specify the value (in GB) for the low watermark (LWM). The LWM defines how much storage space is available in the data store for files with an expired EOL. If the LWM is set to 0, all EOL-free save sets are removed from the data store. The oldest save sets are always deleted first. The LWM for the deduplication store is set to 0 by default and cannot be edited.
The Disk space usage properties are used by SEP sesam to report the following:
- Used: Total used space (in GB) on the partition.
- Total: Maximum available space (in GB) on the partition as reported by the operating system.
- Free: Available disk space (in GB) for SEP sesam.
- Deduplication rate: deduplication occurs once the backup process has started. SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 data store. Only single instances of unique data are sent to the data store and replace each deduplicated file with a stub file. The deduplication rate is higher when there are copy and full backups and when there is a larger amount of data. The deduplication ratio is depicted as ratio:1.
As of v. 4.4.3 Tigon, it is possible to encrypt Si3 deduplication store. For details, see Encrypting Si3 Deduplication Store.
Also introduced in Tigon is initial seed that allows you to seed the Si3 deduplication store for the purpose of replication. For details, see Seeding Si3 Deduplication Store.
Configuring using CLI
The index size (max_pages) and Java's RAM requirements are important parameters for the operation of an Si3 data store as both parameters are used during its creation.
The sm_dedup_interface command is used to configure the hardware for an Si3 data store server. You will have to provide the data store capacity (data store partition) size.
- sm_dedup
Command: sm_dedup_interface propose jvmconfig <value in GB>
For example, for a data store partition of 50 TB (50000 GB), run the command:
sm_dedup_interface propose jvmconfig 50000
bigsrv1:~ # sm_dedup_interface propose jvmconfig 50000
JAVA_OPTS=-Xmx1333M -XX:MaxDirectMemorySize=11724M
These Java RAM parameters are automatically copied to the Si3 data store drive configuration file, which resides in the folder <SESAM_VAR>/ini/stpd_conf
.
- stpd_conf
The Si3 and stpd configuration is saved in an .ini file in the directory gv_rw_ini:stpd_conf
.
The file name is derived from the hw_drives.device (DS@ds1_2), as is the case for every other DS device. Some information are duplicated because they are used both by the Si3 server and stpd.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini
[DEDUP] Backend=dedup Hostname=localhost defaultRepoPath="/datastore/ds1/ds1" maxPages=481900 port=11703 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M"
[DISK_STORE]
Storage_Location=/datastore/ds1/ds1 Size=1000GB backend=dedup hostname=localhost port=11703
- sm.ini
The RAM parameters for Java can be manually set in the sm.ini file. They will override the automatically generated parameters from the drive .ini file. The recommended -Xmx value is ¼ (one quarter) of the available RAM. If 16 GB are available, for example, then at least 4 GB (4096 MB) should be configured for the Si3 data store.
To obtain the default parameters used by Java on the target system, run the command java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).
Linux and SEP sesam 4.4.2 Windows
[Params]
sds_jvm_options=-Xms1024M -Xmx4096M
SEP sesam 4.4.1 Windows
[DEDUP]
sds_jvm_options=-Xms1024M -Xmx4096M
Note | |
: It is possible to set the max_pages of an existing Si3 data store manually but this is not recommended. Do this only if so advised by SEP sesam support. Changing this value will cause a complete rebuilding of the index, which can take considerable time. |
[Params]
sds_jvm_options=-Xmx1333M -XX:MaxDirectMemorySize=11724M
- maxPages
The second parameter (max_pages) is directly related to the Java memory parameter. The RAM is required to hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages.
The max_pages value is stored in the SEP sesam database in the drive's hw_drives.block_size field and is dynamically increased whenever necessary. The parameter is calculated with (hw_drives.block_size (*100)) and then copied to the drive configuration .ini file.
Advanced CLI Administration
The two main maintenance tasks – garbage collection (gc) and file system check (fsck) – run automatically. They are started by sm_start during SEP sesam newday. You can check their status or start/stop the tasks manually.
sm_dedup_interface
- Valid commands and usage
sm_dedup_interface -d <datastore> <command>
status
- purge - objectinfo <remote filename> - put <input filename> <dest filename> - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]] - delete <remote filename> [<filename 2>]* - getlabel - getuuid - list - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>] - gc <start|stop|status|result> - key <set <key> <value>|get <key>|list> - log@server <msg> - propose serverconfig <repository netto GiB> - propose jvmconfig <repository netto Gib> - snapshot - replicate from [-f] <remote hostname> <remote port> <remote filename> - replicate show - replicate abort <task id>
Most of the parameters are for internal use or for future use only.
- gc start
- Starts garbage collection.
- Identifies unreferenced chunks and moves them to the trash.
- Is started by SEP sesam with sm_start.
- gc stop
- Stops garbage collection.
- Can be restarted later.
- get
- Reads an object (file, save set) from the deduplication store.
- '-' can be used to specify STDIN
- put
- Writes an object (file, save set) to the deduplication store.
- '-' can be used to specify STDOUT
- status
- Provides information about used space, saved data, label uuid and whether gc or fsck are currently running.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # sm_dedup_interface -d ds1_2 status INFO Successfully initialized i2dedup library version v2.1.0-SNAPSHOT5 Server Status: Repository information: Version: 2.1.1 UUID: 3b9ec2ae-34e1-11e3-b88b-001b2146 Label: ds1 Max Pages: 481900 Max Pages recommended: 154100 (-Xmx1010M -XX:MaxDirectMemorySize=603M) GC process status: not running: GC finished. Fsck process status: not running: Fsck finished. Interrupted: false. Total Runtime: 1296.68s Bytes in repository: 259.02 GB Bytes delete pending: 9.18 GB
Object information: Objects stored: 258 Data before deduplication: 1541.56 GB Data after deduplication: 58.94 GB Overall DeDup ratio: 96.18 %
Key-Values: No keys stored.
The value Overall DeDup ratio shows by what percentage the stored data has been reduced.
- fsck
- Starts a data store check.
- Must be started manually.
- If the parameter autopurge is set, all corrupted objects will be deleted.
- " "Note: Sesam doesn't get this information until now. " -
- fsck status
- Displays the current state or the state of the most recent data store check.
si3fix:/var/opt/sesam/var/log/sms # sm_dedup_interface -d Si3_5 fsck status
INFO Successfully initialized i2dedup library version v2.0.0-beta2 Current fsck status: Message: Logfile check progress: Bytes: 1270925865083/1512422546049 Throughput: 91.25 MiB/s Running: yes Started: 2013-05-29 20:57:17 Ended: - Bytes Checked: 0 Bytes Lost: 0 Objects checked:
- purge
- Deletes all pages marked as obsolete by the last garbage collection (gc) run (empty trash).
- Is started by sm_start after a SEP sesam day change.
- getlabel and getuuid can be replaced with status
meteorologix:/var/opt/sesam/var/ini # sm_main reload sds
2013-05-28 16:42:06: sm_main[5697] started 2013-05-28 16:42:06: Arguments: sm_main reload sds 2013-05-28 16:42:06: SDS Server: "java" -Xmx1700M -XX:MaxDirectMemorySize=900M -classpath "/opt/sesam/bin/sds/i2dedup-server.jar" \ -Dlogback.configurationFile=/va /opt/sesam/var/ini/sm_sdslog.xml -Dgv_rw_stpd=/var/opt/sesam/var/log/sms/ -Ddrive_num=31 \ -Dconfig.inifile="/var/opt/sesam/var/ini/stpd_conf/SI3_31.ini" i2.dedup.streaming.BinaryProtocolServer Requesting server shut down... 2013-05-28 16:42:07.587 [main] INFO i.d.streaming.BinaryProtocolServer$ - Welcome to SEP DeDup Service. Loading configuration... 2013-05-28 16:42:07.637 [main] DEBUG i2.dedup.streaming.ServerOptions$ - Loaded configuration from ini file: dedup { backend=dedup hostname=localhost defaultRepoPath=/srv/5tb/data/defaultrepo/SI3 maxPages=262143 port=11732 }
Logging
The logging function uses a relatively powerful logback library. For more information see: http://logback.qos.ch/. Note that this information is only for advanced users.
- Logging info
- gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
- The logging function uses a relatively powerful logback library. For more information see: http://logback.qos.ch/.
- /var/opt/sesam/var/log/sms contains two log files (four log files as of version 4.4.1-22):
- sm_dedup_server_info-<drive>.log: INFO level and higher.
- sm_dedup_server-<drive>.log: DEBUG and higher. This file will become quite large.
- sm_dedup_gc-<drive>.log: garbage collection log (version 4.4.1-22 and higher).
- sm_dedup_fsck-<drive>.log: file system check log (version 4.4.1-22 and higher).
- If you updated from version 4.4.1-22 or lower where the gc and fsck logs are missing, copy /opt/sesam/skel/templates/sm_sds.xml to /var/opt/sesam/var/ini/.
- Auto rotation if the log file size reaches 100 MB.
Files and directories
- Objects
For every SEP sesam save set, three objects (files) are stored in the Si3 store:
- <ssid>.data
- <ssid>.info
- <ssid>.info2
The .data and .info file are identical to those of a normal data store. The .info2 file is required for the data to be appended to an Si3 object. All database information not available before the completion of a backup will be written to this file.
- Directories
The path <repo root path>/Si3-POOL/Si3-POOL00001/ is a legacy SEP sesam data store path and has nothing to do with the Si3 store. It will be removed in a future release.
What is next?
After configuring the Si3 deduplication store, configure the media pools first then set up your backup strategy.
See also
Encrypting Si3 Deduplication Store – Configuring Source-side Deduplication – Replication – Seeding Si3 Deduplication Store – SEP Tachometer – List of Licenses