Troubleshooting guide: Difference between revisions
No edit summary |
|||
Line 942: | Line 942: | ||
==Error Messages of the STPD Module== | ==Error Messages of the STPD Module== | ||
For a list of all STPD messages, go to [[STPD Messages]]. | |||
==Error Messages of the SMS Module== | ==Error Messages of the SMS Module== | ||
A list of all SMS error messages is given under [[SMS Messages]]. | A list of all SMS error messages is given under [[SMS Messages]]. |
Revision as of 01:05, 17 July 2009
Introduction
We have developed this 'Troubleshooting Guide' to help you quickly recognize and resolve errors during setup and installation as well as during normal operations. SEP sesam can often serve as an indicator when something has changed or occurred that impacts overall system performance. Changes in SEP sesam backup performance are often caused by system changes or failures.
SEP sesam offers the following options:
- Several Protocols for Report Generation and Notification are Provided (e.g. Email notification)
- Daily protocol files translatable into German from other languages with a program to simplify the analysis.
- Logging mechanism has the ability to change report levels
Especially in the scenario "'SEP sesam runs for weeks without a problem with high throughput"'
and "'suddenly"' the throughput decreases or specific computers are no longer backed up, SEP sesam normally
serves as a detector for technical hardware- and/or software defects or changes in the
network of the operator (Changed addresses, wrongly connected backbone, defective star connector
etc).
For problems in the functional sequence of SEPsesam, the following analysis sequence can
be recommended:
- 1. Check with sm_main status to see if all processes are running. If necessary start the missing process again with sm_main reload ... .
- 2. Daily protocol
- 3. Specific protocols for Backups, Restores
- 4. Logging-files
Problems with the interface (GUI)
Problems with the Java-security settings are sometimes diffcult to recognize, because e.g. the GUI-Client does not start, but the configuration error is to be searched on the GUI-Server.
Error description | Cause, Solution |
GUI Server not reachable |
|
Connection to database failed |
|
GUI cannot start |
|
Problem upon writing/reading to the working directory: |
|
No Online-guide appears |
|
No Online-Help appears |
|
Tips:
- The access rights for missing computers can be entered in the server-side java-policy-file with the program sm_setup allow_gui {host} {user}.
- It is not necessary to restart GUI Server after changes in the sm_java.policy. The refresh is explicitely allowed in sm_java.policy with entry permission java.security.SecurityPermission "getPolicy";.
- The syntax for pathnames in the sm java.policy is platform dependent.
- In order to test if there are problems with the Java Security the line permission java.security.AllPermission; can be un-commented (remove leading '//'). This way the restriction to explicitly specified rights is switched off. To avoid security holes this line should be commented out after the test again (add prefix '//')!
Error Messages of the Sesam Kernel Software
Network Problems
Network problems are among the most frequent error causes, i.e. a computer does not run without problems and/or the hardware-technical connection to it is corrupt. Here the operator must make sure that the communication runs without problems (Network Check (NW-Check)).
Network Check
With the help of standard communication programs the connections and the address resolution must be checked (ping, nslookup, address resolution etc.). Additionally the connection must be checked with the corresponding SEPsesam access program (CTRL/SSH). The address resolution must be consistent, i.e. if for a TCP/IP name the resolution gives an IP-address, then the resolution for that IP-address must give the same TCP/IP name!
Example
# nslookup decunix Server: seplinux2.sep.de Address: 193.28.59.40
Name: decunix.sep.de Address: 193.28.59.94
# nslookup 193.28.59.94 Server: seplinux2.sep.de Address: 193.28.59.40
Name: decunix.sep.de Address: 193.28.59.94
In cases where under 'Cause, Solution' only => support@sep.de appears an analysis of
the logging file is necessary which must be sent to the support for this purpose. support@sep.de
tells you which files are necessary.
Sesam Daily Protocol Error Messages
If an error occurs then the corresponding error message is printed in Sesam daily protocol. The format of the message line is:
{date} {time} {message code} [{process pid}]: {message}
Example:
2009-06-23 11:26:58 E013-BASICS [ 9804]: SHO_FROM_TO Failure in input from - to: -
Some of the messages may also appear if a Sesam program is called in 'Command Box' or 'Command Shell'.
Note
Under {val} the corresponding actual values appear in the messages.
BACKUP - problems at backups
Error message | Cause, Solution |
E001-BACKUP Error during working on PRE interface of backup {val}:{val} |
|
E002-BACKUP Cannot submit SMS_WATCH process to queue {val}. |
|
E003-BACKUP Could not read gv_bck_msg_{val}_{val} |
|
E004-BACKUP {val} {val} |
|
E005-BACKUP Error during working on POST interface of backup {val}: {val} |
|
E006-BACKUP {val} error - please view backup protocol |
|
E007-BACKUP Invalid option {val} ( line {val}) |
|
E008-BACKUP SBC version at client side does not support encryption: {val} |
|
BASICS
Error message | Cause, Solution |
E001-BASICS STR_CHAIN string overflow |
|
E002-BASICS {val} > error opening file {val} mode:{val} |
|
E003-BASICS {val} > {val} not yet existing - please restart SESAM. |
|
E004-BASICS GET_LOCAL_TCPIP_ADRESS could not find a usable WinSock DLL. |
|
E005-BASICS HAL_REDIR string overflow: {val} {val} |
|
E006-BASICS CHECK_SMS autom. SMS restart failed - SMS not running. |
|
E007-BASICS {val} > error deleting file(s) {val} |
|
E008-BASICS Error in command execution {val}: {val} |
|
E009-BASICS Error copying file {val} {val}: {val} |
|
E010-BASICS FILE copy {val} -> {val} could not open source for read |
|
E011-BASICS FILE copy {val} -> {val} could not open target to write |
|
E012-BASICS FILE copy {val} -> {val} copy invalid status {val} |
|
E013-BASICS Failure in input from - to: {val} - {val} |
|
E014-BASICS Could not submit job {val} into queue {val}. |
|
E015-BASICS Missing parameters - more information with {val} |
|
E016-BASICS Sesam daily startup event must not be deleted. |
|
BREAK - Break problem messages
Error message | Cause, Solution |
E001-BREAK Couldn't block queue {val}. |
|
E002-BREAK Couldn't start queue {val}. |
|
E003-BREAK Aborting by {val} at {val} failed ({val}) |
|
E004-BREAK Missing parameters - try sm_break -h |
|
E005-BREAK Wrong input value: {val} |
|
E006-BREAK Interrupt signal SIGINT could not be sent to process {val} |
|
CONFDRI - Configuration Drives
Error message | Cause, Solution |
E001-CONFDRI Submit {val} into queue {val} failed. |
|
E002-CONFDRI Error during activating the SesamMultiplexStream dataserver. |
|
E003-CONFDRI Automatic hardware setup and configuration finished with failure. Please take a look into daily protocol errors. |
|
CONFLOA - Configuration Loader
Error message | Cause, Solution |
E001-CONFLOA Could not find out the status of filling of loader. |
|
E002-CONFLOA Could not find out the drives of loader {val}: {val} |
|
E003-CONFLOA Could not find out the initial contents of loader: {val} |
|
DATABAS - Problems with the database
Error message | Cause, Solution |
E001-DATABAS Failure during access to database: {val} |
|
E002-DATABAS DB_SEL_FIRST {val}: cannot allocate memory |
|
E003-DATABAS DB_SEL_ALLOC {val}: the recordlist is empty, cannot allocate a new element. |
|
E004-DATABAS DB SEL_ALLOC fvalg: cannot allocate memory |
|
E005-DATABAS {val} could not find column {val} within record. |
|
E006-DATABAS {val}: empty recordlist, cannot catch requested data. |
|
E007-DATABAS Unexpected NULL value. |
|
E008-DATABAS CONVERT_DATE_TIME: wrong parameter {val} |
|
E009-DATABAS OA MAKE DB_REC_LIS {val}: {val} |
|
E010-DATABAS OA MAKE DB REC LIS {val}: too many columns ( max {val} ) |
|
E011-DATABAS {val} {val}: cannot allocate memory ( {val} Bytes) |
|
E012-DATABAS {val} SQL pipe doesn't return the expected characters |
|
E013-DATABAS {val} {val}: error during opening a pipe. |
|
E014-DATABAS OA_DO_SQL {val}: error from SQLcommand: {val} |
|
E015-DATABAS OA_DO_SQL {val]: error from DAMISQL: {val} |
|
E016-DATABAS {val} PSQL doesn't create output. |
|
E017-DATABAS {val} {val}: too many columns. |
|
E018-DATABAS {val} with unexpected result: {val}. |
|
E019-DATABAS Invalid date in table results: {val}. |
|
DRIVES - Problems with drives
Error message | Cause, Solution |
E001-DRIVES Wrong media {val} during attempt to mount on drive {val} |
|
E002-DRIVES Could not read label from media in drive {val}. |
|
E003-DRIVES Drive {val} currently not online( {val} ) |
|
E004-DRIVES gv_ro_sms not yet defined (please check INI and restart ) |
|
E005-DRIVES SMS could not find label on media. |
|
E006-DRIVES SMS in not allowed state. |
|
E007-DRIVES The pipes of SMS aren't running - please restart SMS. |
|
E008-DRIVES Error from login to SMS |
|
E009-DRIVES message: {val} |
|
E010-DRIVES Error from initialising the media in drive {val}: {val}.{val}. |
|
E001-GETVOL Request for media {val} completed with error: {val} |
|
HOTSTS - Problems with computers
Error message | Cause, Solution |
E001-HOSTS Configuration of {val} backup client {val} completed with errors. |
|
E002-HOSTS There is no access to computer {val} - please configure it now for {val} |
|
E003-HOSTS Error message:{val} |
|
E004-HOSTS Error during access to database ({val}) |
|
E005-HOSTS cannot open local file {val} to insert SESAM server computer. |
|
E006-HOSTS Missing parameters for configuration of a client. |
|
E007-HOSTS There is still no SESAM client software installed on computer {val} - please do it now. |
|
E008-HOSTS SESAM computer is still not configured - please do it now first. |
|
E009-HOSTS Couldn't find computer \"{val}\" within net ( perhaps name typing error? test with \'ping\') |
|
E010-HOSTS SESAM client on {val} not yet started - for what to do see Day Protocol. |
|
E011-HOSTS Missing permit - please insert [{val}] into {val}:../var/ini/sm_ctrld.auth |
|
E012-HOSTS Backup from {val} not possible. |
|
E013-HOSTS Backup from {val} cannot return to {val} - please check ( ping, DNS, local STPD ). |
|
E013-HOSTS Could not find computer {val} in database: {val} |
|
E014-HOSTS RemoteCmd {val} > error: {val} |
|
E015-HOSTS RemoteCopy error reading from {val} ( num.{val}): {val} |
|
E016-HOSTS RemoteCopy {val} {val} {val} -> {val}: error at {val} |
|
E017-HOSTS RemoteCopy {val} {val} {val} -> {val}: {val} |
|
E018-HOSTS RemoteCopy {val}: invalid mode: {val} |
|
E019-HOSTS RemoteCmd {val} > invalid accessmode - {val} |
|
E020-HOSTS Computer {val} is currently not reachable |
|
LOADERS - Problems with loaders
Error message | Cause, Solution |
E001-LOADERS Action {val} of loader {val} failed |
|
E002-LOADERS Action {val} of loader {val} failed: {val} |
|
E003-LOADERS The specified destination slot {val} of loader {val} is occupied |
|
E004-LOADERS Attention: Loader {val} is auto unloading! Auto unload will now be activated! |
|
E005-LOADERS Attention: Intern drive number of drive {val} ({val}) not found in loader {val}! Please check your drive configuration. |
|
E006-LOADERS Attention: {val} {val} not found in loader {val}! Please check your configuration. |
|
E007-LOADERS Invalid Control Software {val} - only DIR_SLU permitted. |
|
E008-LOADERS Error when creating the SHOW LIST (sm_slu) |
|
E009-LOADERS Invalid loader function {val} - only E,I,L,U,S available. |
|
E010-LOADERS Invalid slot number {val} - there are max. {val}. |
|
MEDIA - Problems with media
Error message | Cause, Solution |
E001-MEDIA {val} error: {val} |
|
E002-MEDIA Error from submitting {val} into queue {val} |
|
E003-MEDIA Error when loading media from slot {val}. |
|
E004-MEDIA Label including wildcards [{val}] could not be converted to a valid Sesam label |
|
E005-MEDIA Archive action {val} completed with error: {val} |
|
E006-MEDIA Archive action {val} completed with error - unknown EXIT code: {val} |
|
E007-MEDIA No media of pool {val} available |
|
E008-MEDIA All media of pool {val} with EOL restriction |
|
E009-MEDIA All EOL free media of pool {val} are loaded in other drives |
|
E010-MEDIA Loaded media {val} is not identical to requested media {val}. |
|
E011-MEDIA Media action 'closetape' failed: {val}. |
|
RECOVER - Disaster Recovery problems
Error message | Cause, Solution |
E001-RECOVER Couldn't start {val} : {val} |
|
E002-RECOVER Loading of media {val} failed ({val}) - signal restore to break |
|
E003-RECOVER Error during copying files from save set {val} on media {val} to {val} (see {val}) |
|
E004-RECOVER Error during creating listing of files in save set {val} on media {val} |
|
E005-RECOVER Error during inserting media with label {val} into archive |
|
E006-RECOVER Error during inserting media-pool {val} into database |
|
E007-RECOVER Missing label in EOM-signal ({val}) |
|
RESTORE - Restore problems
Error message | Cause, Solution |
E001-RESTORE LIS_DB > {val} |
|
E002-RESTORE Restore task {val} not yet defined - please do it now |
|
E003-RESTORE Restore {val} completed with errors: {val} |
|
E005-RESTORE Selective generations restore cannot allocate memory ( {val} ). |
|
E006-RESTORE SEARCH_LIS {val} is an invalid name of task. |
|
E007-RESTORE Error in {val}-activities: {val} |
|
SBC_COM - Sesam Communication with External Backups
Error message | Cause, Solution |
E000-SBC_COM Wrong parameter |
|
E001-SBC_COM Wrong number of parameters |
|
E002-SBC_COM Missing or wrong mandatory task |
|
E003-SBC_COM Failure selecting from database table |
|
E004-SBC_COM Missing mandatory label, media pool or drive number |
|
E005-SBC_COM FIND_DRIVES_OF_POOL no drives for pool {val} configured |
|
E006-SBC_COM A session {val} was not connected before |
|
E007-SBC_COM Missing mandatory save set identifier |
|
E008-SBC_COM GET_SEGM_AND_OFFSETS could not find selected file {val} in save set {val} |
|
E009-SBC_COM GET_SEGM_AND_OFFSETS wrong format in line [{val}] |
|
E010-SBC_COM The entered tapeserver {val} is invalid for drive {val} (see drive properties). |
|
E011-SBC_COM Wrong parameter -s {val}: we need savesetname@starting-time as returned from Open Restore |
|
E012-SBC_COM Couldn't get node of drive {val} - please check drive's properties. |
|
E013-SBC_COM Restore not possible, bcs. backup {val} was not successful |
|
E014-SBC_COM INQUIRE_INFO wrong type |
|
E015-SBC_COM INQUIRE_INFO save set {val} doesn't exist |
|
E016-SBC_COM INQUIRE_INFO cannot open listing file {val} of save set {val} to read |
|
E017-SBC_COM Submit sm_sbc_com_ext failed, Saveset |
|
E018-SBC_COM Failure updating database table |
|
E019-SBC_COM CONNECT_BACKUP loading media {val} into drive {val} failed |
|
E020-SBC_COM LOTUS_SAVESET there's no save set containing file {val} |
|
E021-SBC_COM Task {val} has Backup Type {val} but requested Type is {val}. |
|
E022-SBC_COM Medium {val} is currently not available. |
|
E023-SBC_COM Task {val} not yet configured. |
|
E024-SBC_COM Pool {val} does not exist |
|
E030-SBC_COM Failure in file copy {val} {val} |
|
SEPULER - Messages of the SEP sesam Scheduler
Error message | Cause, Solution |
E001-SEPULER Error from initialisation of queue {val} (type {val}). |
|
E002-SEPULER Error from submitting {val} into queue {val} (type {val}). |
|
E003-SEPULER Duplication of primary key in DB:results. |
|
E005-SEPULER invalid command: {val} |
|
E006-SEPULER Restoretask {val} not yet defined - please do it now. |
|
E007-SEPULER READ_INI sent error {val} |
|
E007-SEPULER There is no drive group attached to media pool {val} |
|
E008-SEPULER There are no drives attached to drive group {val} |
|
E009-SEPULER String overflow: {val} --> {val} |
|
E010-SEPULER Wrong type of schedule cycle: {val} |
|
E011-SEPULER Error happend during calculation of user defined list for schedule {val} |
|
E012-SEPULER Calculating next execution for schedule {val} returns always the same time |
|
OTHERS
Error message | Cause, Solution |
E001-START Drive is not available ({val}) |
|
E002-START Drive {val} does not exist ({val}) |
|
E001-STARTAL Date unknown, please restart NEWDAY |
|
E001-STARTUP Directory {val} not found! |
|
E001-WATCH Received performance data with invalid format: {val} |
|
E002-WATCH EOM completed with error: {val} |
|
E001-GETVOL Request for media {val} completed with error: {val} |
|
E006-COPY Copy of save set {val} failed! |
|
E008-COPY Copy of save sets {val} failed: {val} |
|
Error Messages of the Backup Modules
The Sesam Backup Modules are producing special messages. The protocol files are scanned after backup or restore and in case of warning or error state the first identified message is printed in a summary at the end of the protocol.
A list of all SBC messages (C header file) is given under SBC Messages.
Functional Stack of Backup Modules
Every backup module uses X/Open Systems Management: Backup Services API (XBSA) Standard (see http://www.opengroup.org for further details). SEP's XBSA is based on a FTP implementation ( See RFC 959 ). The Backup module connects to SEP's Sesam Transfer Protocol Daemon (STPD), a FTPD daemon implementation, to send or retrieve data from SEP's Sesam Multiplex Server (SMS) daemon.
An error message is composed from the triggering layer up to the upper layers. If the Operating System returns an error the error code and the operating system message will be added to the message.
As a result an error message may return information of 5 layers: SBC - XBSA - FTP - SMS - Operating System
Example of Backup Module Protocol
The following example shows a typical backup protocol.
2009-06-26 10:28:16: sbc-3036: Info: # SESAM BACKUP CLIENT FOR Windows NT FILE SYSTEMS, VERSION: 3.2A17 Build Revision: 1.257 (x64), Released: Jun 25 2009 # 2009-06-26 10:28:16: sbc-3063: Info: -------------------- Operation Parameters -------------------- 2009-06-26 10:28:16: sbc-3019: Info: OS info: Microsoft Windows Server 2008, Build: 6001 Service Pack 1 (x64) 2009-06-26 10:28:16: sbc-3100: Info: Program PID: 42900 2009-06-26 10:28:16: sbc-3030: Info: Operation: BACKUP, Level: COPY 2009-06-26 10:28:16: sbc-3031: Info: Storage Host: qsbox3:11001,0-0:SESAM_SECURE_AUTHENTICATION:**** 2009-06-26 10:28:16: sbc-3032: Info: Control Host: qsbox3:11001:SESAM_SECURE_AUTHENTICATION:* 2009-06-26 10:28:16: sbc-3040: Info: Device: SMS:disk1:SHARE:64 2009-06-26 10:28:16: sbc-3064: Info: --------------------- Operation Messages --------------------- 2009-06-26 10:28:16: sbc-3002: Info: Building file list from: [C:\SEPsesam\var\ini] 2009-06-26 10:28:16: sbc-3022: Info: Command line ["sbc" "-b" "-C" "qsbox3:11001" "-S" "qsbox3:11001" "-l" "copy" "-s" "SF20090626102812" "-d" "SMS:disk1" "-t" "weekly00001:1" "-j" "TEST_BACKUP" "-i" "job=TEST_BACKUP,nod=qsbox3,cmd=sbc,src=C/ /SEPsesam /var/ini,ptf=WNT,typ=Path,exc=" "C:/SEPsesam/var/ini" ] 2009-06-26 10:28:16: sbc-3003: Info: Opening saveset: SF20090626102812 2009-06-26 10:28:18: sbc-3104: Info: Saveset info: [SEGMENT=3] 2009-06-26 10:28:18: sbc-3004: Info: Begin writing to saveset... 2009-06-26 10:28:18: sbc-3074: Info: Backup start time [20090626102818] 2009-06-26 10:28:18: sbc-3143: Info: Starting with drive C: 2009-06-26 10:28:18: sbc-3006: Info: Saveset size: 98304 bytes. Throughput: 189.820 MB/Hour. 2009-06-26 10:28:18: sbc-3005: Info: Closing saveset. 2009-06-26 10:28:18: sbc-3052: Info: Items processed correctly: [25]. Not processed or incorrectly processed items: [0]. 2009-06-26 10:28:18: sbc-3007: Info: Operation successful. 2009-06-26 10:28:19: sbc-3001: Info: Exiting.
The protocol shows 4 sections:
- About Module
- Operational Parameters
- Processing
- Summary
Verbose Levels of the Backup Modules
The verbose level of the backup protocol can be controlled by giving a specific log level with the switch '-v' in the 'Save options' field in the backup task properties in tab 'Options 1'. For instance it may be set to '-v 3' to get more protocol messages about processing of items.
The following log levels are provided:
0 print only standard and error messages together with summary 1 add a line for every item when processing starts for it: 'sbc-3008: Info: Processing item: [xxx]...' 2 add a line when processing for item finished: 'sbc-3108: Info: Item processed successfully: [xxx]' 3 add backup module processing information 4 add underlying module processing: XBSA and DB_API modules 5 add packing data (mtf, cpio, sidf) trace messages
Backup and Restore Protocols
Backup Protocols
If an error occurs then at the end of a backup protocol a summary is printed. The error message summary is prefixed by a short information string.
The full error message is composed with
{Status}/{Amount}/{Saveset ID}/{SbcStart}/{message}
'Status' is one of
0 successful 1 warning 2 empty lis 3 broken during backup C broken before data transfer X failed
'Amount' is the data amount stored on media, 'Saveset ID' is the generated save set id and 'SbcStart' is the starting time on client host.
The following example shows an backup error summary with all 5 layers prefixed by a short information string:
X/0/SF20060629233007/20060629232907/Error: XBSA Call BSAEndData (closing saveset) failed: System detected error, operation aborted. TRANSIENT or PERMANENT NEGATIVE reply: 553 STOR Failed. 1037: Writing data block on tape failed (23): Data error (cyclic redundancy check). 1039: Writing of Saveset Trailer failed.
An operating system error 23 (ERROR_CRC) occurs if tape drive cannot write proper blocks on the media. Tape drive and media must be checked.
Restore Protocols
At the end of a restore protocol a summary is always printed. The summary shows the results of the restore of the Backup Module and the POST processing. The final status is successful if both were successful.
See the following restore protocol example:
SBC-STATUS: Restore was successful I009-RESTORE POST for restore SESAM_BACKUP-20090701_143152 from LTO00001 is not activated RESTORE STATUS: Restore was successful
The following example shows a summary after an unsuccessful restore together with SBC error message:
SBC-STATUS: Restore was not successful. 2009-06-30 15:32:31: sbc-1067: Error: XBSA Call BSAGetObject failed with message: Access to the requested object is not possible. RETR failed. NEGATIVE reply: 553 RETR Failed. 1017: Invalid segment number. (0)
An "1017: Invalid segment number" may occur if a wrong medium is mounted in the drive.
Error Messages of the STPD Module
For a list of all STPD messages, go to STPD Messages.
Error Messages of the SMS Module
A list of all SMS error messages is given under SMS Messages.