TUCoPS :: Phreaking General Information :: bellcrt1.txt

ESS Faults

 $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$
 %$%                                                                     %$%
 $%$               Electronic Switching System Faults                    $%$
 %$%                                                                     %$%
 $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$




"Notes from No 2 ESS Administration and Maintenance Plan,"
"BSTJ Vol 48, 1969"

"Data Maintenance"


Memory mutilation results from hardware faults and program bugs.
During nonsynchronous operation mismatch detection not available so
there may be a long period of time during which mutilation occurs.
Mismatch detection useless in finding data mutilation caused by program bugs.

Data maintenance aided by
ease of communication among programs,
absence of linked lists, and
per call memory allocation (Call processing program addressing is relative to the allocated memory, reducing scope of data accesses).

Defensive programming techniques:

Range check table indexes,
Zero check derived transfer-to addresses, and
Distinct program and data errors prevent programs being read as data.
Audit programs detect bad data.
Audits run periodically or as requested from tty.
Separate audits for different memory blocks
Audits correct by idling memory blocks containing bad data.
System recovery initiated by control unit switch during simplex operation, control
unit switch can be caused by bad data or bugs that cause sanity time out.

System recovery Funtions:
Make call store consistent with state of periphery.
Clear memory associated with program in control at time of recovery,
Run audits,
Repeat the above with widening scope of memory initialization until sanity obtained




"Notes from Design of Recovery Strategies for A Fault Tolerant No. 4 ESS"
"by R. J Willet - BSTJ vol 61, no 10, 4-13-82"

"Objectives"

616,000 call attempts/hour
100,000 acive terminations
Downtime less than 2 hours in 40 years
Not cost-effective (or possible) to remove all software errors - minimize
number of service effecting errors and analyze data for cause.


"Software Recovery"
Reconstruct data from associated information - slow, disturbs few calls.
Reinitialize memory structure - fast, disturbs many calls.


"Audit Programs"
Provide for integrity of system memory
Structured into mutilation detection and correction modules
Detection modules run continiously in background
Detection modules augmented by defensive checks in operational programs
Call correction modules to correct errors found by background audits or
defensive checks.


"System Integrity Programs"
Provide for integrity of programs
Monitor job scheduling and sequencing for frequency and execution times
Use sanity timers
Call audits or reinitialize system to correct errors.


"Recovery from software problems"
Software problems caused by program errors or bad data
Out-of-range accesses trigger hardware interrupt, recovery
requires correction of data, or killing of call and return of control
to a safe point.
Inhibit (pest) interrupts while audits are correcting problem,
risky, but assumes single software fault.
In cases where the out-of-range error can be isolated to a single unit can use frame level pesting, otherwise use system level pesting.
Software recovery does not consider the possibility of a hardware fault.
Recovery cannot fix a program bug.  Running pested may allows the system to
operate in a degraded fashion while maintenance personnel analyze data and
correct program.
The buffer overflow problem - may be caused by program error.
Buffers protected by hardware overflow interrupts.
Recovery runs the buffer unloader program to unload the buffer and audits the task dispenser program to ensure the unloader is scheduled properly.
The overflow interrupt is pested.
If problem continues, hardware is suspect.



"No. 4 ESS: Maintenance Software"
"by M. N. Meyers, W. A. Routt and K. W. Yoder,"
"BSTJ Vol. 56, No. 7, September 1977"


"Software Error Recovery"
Since system operation is dependent on data in memories, and memories can be written, there is a possibility the memory will be in a state that precludes operation.
System must be as error-free as possibile.
Since system cannot be completely error-free, it must be error tolerant.


"Classification of software errors"
Errors in interfaces between software modules.
Non-conformity to systems rules.
KsO$

TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2024 AOH