Source:Administering Si3 NG Deduplication Store

From SEPsesam

The Si3 NG data store is set up on a dedicated Linux server with SEP sesam installed. SEP sesam provides command utilities for configuring and managing Si3 NG data stores.

The main maintenance tasks on Si3 NG data store are garbage collection (gc) and file system check (fsck). These tasks are run automatically. Garbage collection (gc) is started by sm_start during SEP sesam newday. The file system check (fsck) is performed repeatedly in regular intervals and automatically with every backup.

The Si3 NG has two types of file system check (fsck): object check (occk), which checks if the Si3 NG data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

SEP sesam provides the sm_dedup_interface utility for configuring and managing Si3 NG data stores, and for recovering corrupted Si3 NG data stores.

Information sign.png Note
ou must have SEP sesam administrator privileges to run SEP sesam CLI commands and use the command prompt as an administrator. All commands are run from the <SESAM_ROOT>/bin/sesam/ directory. If you want to execute SEP sesam commands globally (and not from the actual run directory), set the SEP sesam profile as described in What happens when I set a profile?.

Administering Si3 NG data store

To perform administrative tasks and manage the Si3 NG data store you can use the sm_dedup_interface utility. Below is a list of some commands and their usage.

The general syntax for the sm_dedup_interface commands is:

sm_dedup_interface -d <datastore> <command>

The following commands are available:

 - purge
 - objectinfo  <remote filename>
 - put <input filename> <dest filename>
 - get  <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]]
 - delete <remote filename> [<filename 2>]*
 - getlabel
 - getuuid
 - list
 - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>]
 - gc <start|stop|status|result>
 - key <set <key> <value>|get <key>|list>
 - log@server <msg>
 - propose serverconfig <repository netto GiB>
 - propose jvmconfig (see Notea)
 - snapshot
 - replicate from [-f] <remote hostname> <remote port> <remote filename>
 - replicate show
 - replicate abort <task id>
Notea

To find out how much RAM is needed at what capacity of Si3-NG use the following command:

sm_dedup_interface -T dedup2 propose jvmconfig <Si3NG_capacity>

The output of MaxDirectMemorySize is the required RAM value.

Note however, that SEP sesam calculates the RAM consumption and uses these commands in the background. It is usually not needed to set the values manually. These manual changes are overwritten with the next drive configuration.

The index calculation is also associated with the command. If the index grows and is 95% full, backups can no longer be performed. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages. To solve the problems with the growing index, refer to Si3 Deduplication Troubleshooting.

Specific options

Most of the parameters are for internal use only.

status
Provides information about used space, stored data, label uuid and running processes (gc or fsck), etc.

Si3 NG status output example

sm_dedup_interface -d 3 status
Server Status:
 Repository information:      2022-03-07 16:01:31
  Start time:                 2022-02-22 16:32:15
  Server:                     localhost:11704
  Path:                       /srv/single_disk/Si3-NG-b11
  Version:                    Version: Si3-NG Branch: 4321a7ba7bafbfb7e9a186a3821b0e0bf08d19bc Build:   4321a7b Commit: 2022-02-09 15:37:49 Build date: 2022-02-09 15:41:18
  UUID:                       5e999930-bd3f-11ea-8471-b79d351122df
  Label:                      Si3-NG-b11
  PCCK process status:        not running: No items found to process: Stop time: 2022-03-07 16:00:48 (Started: 2022-03-07 16:00:48)
  OCCK process status:        not running: No items found to process: Stop time: 2022-03-07 16:00:47 (Started: 2022-03-07 16:00:47)
  GC process status:          not running: Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]: Stop time: 2022-03-07 10:00:15 (Started: 2022-03-07 08:10:56)
  Bytes in repository:            534.49 GiB
  Bytes delete pending:           159.60 GiB
  Pages dir size:                 534.42 GiB
  Object dir size:                  0.45 GiB
  Trash dirs size:                159.60 GiB
  Active tasks:               All: 0, Backup: 0, Restore: 0, GC: 0, OCCK: 0, PCCK: 0
  Sanity state:               OK
  JVM arguments:              -Xmx3335M, -Dlogback.configurationFile=/var/opt/sesam/var/ini/sm_sdslog2.xml, -Dgv_rw_stpd=/var/opt/sesam/var/log/sms, -Dlogs.dir=/var/opt/sesam/var/log/sms, -Ddrive_num=3, -Dconfig.inifile=/var/opt/sesam/var/ini/stpd_conf/Si3-NG-b11_3.ini
  Recommended JVM arguments:  -Xmx3312M
 Si3-storage:                Bytes All: 1999421108224, Use: 736072216576, Free: 1263348891648, Used: 36%
Index information:
 Size:                             0.34 GiB
 Utilization:                57.35% (32890421/57344000)
 Reindex:                             -
Object information:
 Objects stored:                  36090
 Data before deduplication:       10.66 TiB
 Overall DeDup ratio:          1 / 20.32
 Saved storage space:             95.08 %
S3 information:
 State:                      OFF
 Bucket:

The value Overall dedup ratio shows by how many percent the stored data has been reduced.

gc start
  • Starts the garbage collection.
  • Identifies unreferenced chunks and moves them to the trash.
  • Is started by SEP sesam with sm_start.
gc stop
  • Stops the garbage collection.
  • Can be restarted later.
gc status

Si3 NG status output example

sm_dedup_interface -d 3 gc status
Current gc status:
 State:                       Finished
 Started:                     2022-03-07 08:10:56
 Ended:                       2022-03-07 10:00:15
 Message:                     Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
STATUS=SUCCESS MSG=Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
get
  • Reads an object (file, saveset) from the deduplication store.
  • '-' can be used to specify STDIN.
put
  • Writes an object (file, saveset) to the deduplication store.
  • '-' can be used to specify STDOUT.
fsck
  • Starts a data store check.
  • Must be started manually.
  • If the parameter autopurge is set, all corrupted objects are deleted.
fsck status
Displays the current state or the state of the most recent data store check.

The Si3 NG deduplication store has two types of fsck: object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

Si3 NG fsck status output example

sm_dedup_interface -d 3 fsck status
Current occk status:
 Mode:                        Incremental. Since 2022-03-07 09:18:25
 State:                       Finished
 Started:                     2022-03-07 16:01:53
 Ended:                       2022-03-07 16:01:53
 Last Full successful:        2022-01-04 10:41:20
 Message:                     No items found to process
 Previous error:              -
Current pcck status:
 Mode:                        Incremental. Since 2022-03-07 09:58:48
 State:                       Finished
 Started:                     2022-03-07 16:01:53
 Ended:                       2022-03-07 16:01:53
 Last Full successful:        2022-01-04 11:58:25
 Message:                     No items found to process
 Previous error:              -
purge
  • Deletes all pages marked as obsolete (empty trash) by the last run of garbage collection (gc).
  • Is started by sm_start after a SEP sesam day change.
  • getlabel and getuuid can be replaced with status.

Repairing corrupted Si3 NG data store

You can repair the Si3 NG store when pages or objects get corrupted.

  1. First determine the scope of corruption:
    • To get the list of corrupted objects use:
      sm_dedup_interface -d <datastore> corruptedobjects
    • To get the list of corrupted pages use:
      sm_dedup_interface -d <datastore> corruptedpages
  2. Use the following command to replace the page in /pages directory with an older version from /pages-trash directory:
    sm_dedup_interface -d <datastore> repair pages
    The pages in trash contain all chunks deleted on previous GC. The oldest version of a page takes priority.
  3. Use the following command to search for and recover the missing chunks in /pages-trash directory:
    sm_dedup_interface -d <datastore> repair start
    During the repair process a new page is created, which contains all chunks from the current page (page affected by 'missing chunks' issue) and all chunks found in the trash.

Cleanup of unrecoverable Si3 NG store

SEP Warning.png Warning
You should use the commands described in this section only in case the corrupted store cannot be recovered.

When corruptions in the Si3 NG store persist, the initial page version has already been purged from trash or there were fatal errors during backup or restore. In this case broken pages or missing chunks cannot be recovered.

Cleanup can be performed by deleting unrecoverable objects manually or by using the automatic cleanup function.

Deleting objects

When there are only a few unrecoverable objects, delete each object with the following commands:

sm_dedup_interface -d <datastore> delete corruted_object_id_1

...

sm_dedup_interface -d <datastore> delete corruted_object_id_Nth

In case of many corruptions you can delete all corrupted objects using the following command:

sm_dedup_interface -d <datastore> fsck purge
Garbage collection

When you have deleted all unrecoverable objects, run garbage collection (gc):

sm_dedup_interface -d <datastore> gc start
Automatic cleanup function

To start an automatic cleanup function, use the following command:

sm_dedup_interface ... fsck purge auto

The automatic cleanup function runs the following sequence of commands: PCCK start -> OCCK start -> Delete all corrupted objects -> GC start.

Logging

The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.

Logging info
  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • /var/opt/sesam/var/log/sms contains two log files:
    • sm_dedup_server_info-<drive>.log: Log level INFO and higher.
    • sm_dedup_server-<drive>.log: Log level DEBUG and higher. This file can become quite large.
    • sm_dedup_gc-<drive>.log: garbage collection log.
    • sm_dedup_fsck-<drive>.log: file system check log.
  • Auto rotation if the log file size reaches 100 MB.

Files and directories

Objects

For every SEP sesam saveset, three objects (files) are stored in the Si3 NG store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.

Directories
Copyright © SEP AG 1999-2024. All rights reserved.
Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.