Both, Si3T and Si3S require a configured Si3 deduplication store. As of SEP sesam v. 5.0.0 Jaglion, a new generation Si3 NG deduplication store can be used. Compared to the "old" Si3 store type, Si3 NG offers significantly higher performance for backup, restore and migration, as well as direct backup to S3 cloud, resulting in improved performance, scaling and resource savings.
Typically, only one Si3 deduplication store can be configured on a server. However, since a direct upgrade from the old Si3 to Si3 NG is not supported, you can replicate from Si3 to Si3 NG. For this purpose, you can also configure a new Si3-NG and an old Si3 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup. For details, see Enabling Si3 NG setup on the same host.
- A valid licence is required for each Si3 deduplication store.
- For more detailed comparison of Si3 and Si3 NG, see Comparison of Si3 and Si3 NG.
- For information on configuring the Si3 deduplication store via the GUI, see Configuring Si3 NG Deduplication Store.
Deprecated Si3 data store
|The old generation Si3 deduplication store is deprecated. This means that the old generation Si3 is no longer being enhanced, but is still supported until further notice. SEP strongly recommends using the new Si3 NG data store instead, especially if the data is to be stored to S3 Cloud.|
- If you are using an old generation Si3 deduplication store with S3, you will not be able to restore from S3 via the GUI.
- You can configure a new Si3 NG and an old Si3 in parallel on the same host and replicate from the old Si3 to the Si3 NG store. For details, see Configuring Si3 NG Deduplication Store.
For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements. Keep in mind that these requirements represent the demand for deduplication only. In addition, the amount of memory for the operating system and other services should be taken into account.
In addition, the following prerequisites must be met to configure a Si3 deduplication store.
Using CLI for Si3 data store configuration
SEP sesam provides command utilities for configuring and managing Si3 data stores. The following section provides some examples of commands and syntax.
|You must have SEP sesam administrator privileges to run SEP sesam CLI commands and use the command prompt as an administrator. All commands are run from the |
The index size (max_pages) and Java's RAM requirements are important parameters for the operation of a Si3 data store as both parameters are used during its creation.
The sm_dedup_interface command is used to configure the hardware for a Si3 data store server. More details are available below in the section sm_dedup_interface.
The Si3 and stpd configuration is stored in an .ini file in the directory
The file name is derived from the hw_drives.device (DS@ds1_2), as with any other DS device. Some information is duplicated because it is used both by both the Si3 server and stpd.
bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini [DEDUP] Backend=dedup Hostname=localhost defaultRepoPath="/datastore/ds1/ds1" maxPages=481900 port=11703 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M" [DISK_STORE] Storage_Location=/datastore/ds1/ds1 Size=1000GB backend=dedup hostname=localhost port=11703
The RAM parameters for Java can be manually set in the sm.ini file. They override the automatically generated parameters from the drive .ini file. The recommended -Xmx value is ¼ (one quarter) of the available RAM. For example, if 16 GB is available, at least 4 GB (4096 MB) should be configured for the Si3 data store.
To obtain the default parameters used by Java on the target system, run the command java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).
The second parameter (max_pages) is directly related to the Java memory parameter. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages.
The max_pages value is stored in the SEP sesam database in the drive's hw_drives.block_size field and is dynamically increased whenever necessary. The parameter is calculated with (hw_drives.block_size (*100)) and then copied to the drive configuration .ini file. If you have problems with the index, please contact SEP sesam support.
Advanced CLI Administration
Si3's main maintenance tasks are garbage collection (gc) and file system check (fsck) and run automatically. Garbage collection (gc) is started by sm_start during SEP sesam newday. The file system check (fsck) is carried out at regular intervals (again and again) and automatically with every backup.
The new generation of Si3 deduplication store, Si3 NG, has two types of file system check (fsck): object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.
You can check their status or start/stop the tasks manually.
This is the main utility for configuring and managing data stores. Below is a list of some commands and their usage.
|Depending on the deduplication store used, Si3 or Si3-NG, some of the commands may be slightly different. When relevant, both command versions are described.|
sm_dedup_interface -d <datastore> <command>
- purge - objectinfo <remote filename> - put <input filename> <dest filename> - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]] - delete <remote filename> [<filename 2>]* - getlabel - getuuid - list - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>] - gc <start|stop|status|result> - key <set <key> <value>|get <key>|list> - log@server <msg> - propose serverconfig <repository netto GiB> - propose jvmconfig <repository netto Gib> (for Si3 store; slightly different usage for Si3 NG, see Notea) - snapshot - replicate from [-f] <remote hostname> <remote port> <remote filename> - replicate show - replicate abort <task id>
Depending on the deduplication store used, Si3 or Si3-NG, the command to find out how much RAM is needed at what capacity of Si3/Si3-NG differs slightly. Example:
- Use the command
sm_dedup_interface -T dedup2 propose jvmconfig <Si3_capacity>.
- Use the command
sm_dedup_interface propose jvmconfig <Si3_capacity>.
The output of MaxDirectMemorySize is the required RAM value.
Note, however, that SEP sesam calculates the RAM consumption and uses these commands in the background. It is usually not needed to set the values manually. These manual changes are overwritten with the next drive configuration.
The index calculation is also associated with the command. If the index grows and is 95% full, backups can no longer be performed. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages. To solve the problems with the growing index, refer to Si3 Deduplication Troubleshooting.
Most of the parameters are for internal use only.
Provides information about used space, stored data, label uuid and running processes (gc or fsck), etc.
sm_dedup_interface -d ds1_2 status INFO Successfully initialized i2dedup library version v2.1.0-SNAPSHOT5 Server Status: Repository information: Version: 2.1.1 UUID: 3b9ec2ae-34e1-11e3-b88b-001b2146 Label: ds1 Max Pages: 481900 Max Pages recommended: 154100 (-Xmx1010M -XX:MaxDirectMemorySize=603M) GC process status: not running: GC finished. Fsck process status: not running: Fsck finished. Interrupted: false. Total Runtime: 1296.68s Bytes in repository: 259.02 GB Bytes delete pending: 9.18 GB Object information: Objects stored: 258 Data before deduplication: 1541.56 GB Data after deduplication: 58.94 GB Overall DeDup ratio: 96.18 % Key-Values: No keys stored.
sm_dedup_interface -d 3 status Server Status: Repository information: 2022-03-07 16:01:31 Start time: 2022-02-22 16:32:15 Server: localhost:11704 Path: /srv/single_disk/Si3-NG-b11 Version: Version: Si3-NG Branch: 4321a7ba7bafbfb7e9a186a3821b0e0bf08d19bc Build: 4321a7b Commit: 2022-02-09 15:37:49 Build date: 2022-02-09 15:41:18 UUID: 5e999930-bd3f-11ea-8471-b79d351122df Label: Si3-NG-b11 PCCK process status: not running: No items found to process: Stop time: 2022-03-07 16:00:48 (Started: 2022-03-07 16:00:48) OCCK process status: not running: No items found to process: Stop time: 2022-03-07 16:00:47 (Started: 2022-03-07 16:00:47) GC process status: not running: Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]: Stop time: 2022-03-07 10:00:15 (Started: 2022-03-07 08:10:56) Bytes in repository: 534.49 GiB Bytes delete pending: 159.60 GiB Pages dir size: 534.42 GiB Object dir size: 0.45 GiB Trash dirs size: 159.60 GiB Active tasks: All: 0, Backup: 0, Restore: 0, GC: 0, OCCK: 0, PCCK: 0 Sanity state: OK JVM arguments: -Xmx3335M, -Dlogback.configurationFile=/var/opt/sesam/var/ini/sm_sdslog2.xml, -Dgv_rw_stpd=/var/opt/sesam/var/log/sms, -Dlogs.dir=/var/opt/sesam/var/log/sms, -Ddrive_num=3, -Dconfig.inifile=/var/opt/sesam/var/ini/stpd_conf/Si3-NG-b11_3.ini Recommended JVM arguments: -Xmx3312M Si3-storage: Bytes All: 1999421108224, Use: 736072216576, Free: 1263348891648, Used: 36% Index information: Size: 0.34 GiB Utilization: 57.35% (32890421/57344000) Reindex: - Object information: Objects stored: 36090 Data before deduplication: 10.66 TiB Overall DeDup ratio: 1 / 20.32 Saved storage space: 95.08 % S3 information: State: OFF Bucket:
The value Overall dedup ratio shows by how many percent the stored data has been reduced.
- gc start
- Starts the garbage collection.
- Identifies unreferenced chunks and moves them to the trash.
- Is started by SEP sesam with sm_start.
- gc stop
- Stops the garbage collection.
- Can be restarted later.
- gc status
- Si3 NG gc status output example
sm_dedup_interface -d 3 gc status Current gc status: State: Finished Started: 2022-03-07 08:10:56 Ended: 2022-03-07 10:00:15 Message: Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0] STATUS=SUCCESS MSG=Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
- Reads an object (file, saveset) from the deduplication store.
- '-' can be used to specify STDIN.
- Writes an object (file, saveset) to the deduplication store.
- '-' can be used to specify STDOUT.
- Starts a data store check.
- Must be started manually.
- If the parameter autopurge is set, all corrupted objects are deleted.
- fsck status
- Displays the current state or the state of the most recent data store check.
The Si3 NG deduplication store has two types of fsck: object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.
si3fix:/var/opt/sesam/var/log/sms # sm_dedup_interface -d Si3_5 fsck status INFO Successfully initialized i2dedup library version v2.0.0-beta2 Current fsck status: Message: Logfile check progress: Bytes: 1270925865083/1512422546049 Throughput: 91.25 MiB/s Running: yes Started: 2018-05-29 20:57:17 Ended: - Bytes Checked: 0 Bytes Lost: 0 Objects checked:
sm_dedup_interface -d 3 fsck status Current occk status: Mode: Incremental. Since 2022-03-07 09:18:25 State: Finished Started: 2022-03-07 16:01:53 Ended: 2022-03-07 16:01:53 Last Full successful: 2022-01-04 10:41:20 Message: No items found to process Previous error: - Current pcck status: Mode: Incremental. Since 2022-03-07 09:58:48 State: Finished Started: 2022-03-07 16:01:53 Ended: 2022-03-07 16:01:53 Last Full successful: 2022-01-04 11:58:25 Message: No items found to process Previous error: -
- Deletes all pages marked as obsolete (empty trash) by the last run of garbage collection (gc).
- Is started by sm_start after a SEP sesam day change.
- getlabel and getuuid can be replaced with status
The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.
- Logging info
- gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
- /var/opt/sesam/var/log/sms contains two log files:
- sm_dedup_server_info-<drive>.log: INFO level and higher.
- sm_dedup_server-<drive>.log: DEBUG and higher. This file will become quite large.
- sm_dedup_gc-<drive>.log: garbage collection log.
- sm_dedup_fsck-<drive>.log: file system check log.
- Auto rotation if the log file size reaches 100 MB.
Files and directories
For every SEP sesam saveset, three objects (files) are stored in the Si3 store:
The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.
The path <repo root path>/Si3-POOL/Si3-POOL00001/ is a legacy SEP sesam data store path and has nothing to do with the Si3 store. It will be removed in the future.