- All Best Essays, Term Papers and Book Report


Essay by   •  September 3, 2015  •  Coursework  •  557 Words (3 Pages)  •  1,137 Views

Essay Preview: Failures

Report this essay
Page 1 of 3


Randall Anderson


August, 17th 2015

Alicia Pearlman


A distributed system is a piece of software that ensures that a collection of independent computers that appears to its users as single coherent system. This type of system doesn’t come without some issues. These issues are called failures, four of these failures are Byzantine failures, Omission failures, Crash failures, and Hardware failures.

Arbitrary failures or as they are also known as Byzantine failures are failures that happen at the server level of a distributed system. These type of failures can cause a server to act randomly. This inappropriate and random behavior increases the chances of malicious events and duplication of messages and updates from the server due to these failures (Agbaria A, Friedman R,). To overcome this type of failure one can use intrusion detection systems. When the failure is detected the process is removed and replaces with another process.

Messages lost in transit are known as Omission failures. These failures are caused by a lack of reply from the server of a distributed system. Causes for this lack of response can be attributed to MAC layer collisions or a receiver out of range.

Crash failures are failures that can be shared by both the distributed and centralized systems. These failures are usually attributed to a server faults these faults interrupt operation of a server and can halt the system for a long time. Failures at the operating system level and software level are examples of this failure. In a distributed system fault tolerant systems are designed t handle the effects of this type of failure. In a centralized system like a database the ability to back up your data via some kind of mass storage device like a SAN helps to recover your data in the event that the hardware or software crash.

Hardware failures are another failure that is shared by both the distributed and centralized system. With the innovations in hardware these failures are not as common anymore. In a distributed system the failure of hardware on one machine does not take down the whole network. Communication continues and this allows the one machine to be repaired and put back online within the system. In a centralized system the failure of a piece of hardware such as a hard drive can be crippling and devastating especially if this system is relied upon to keep a company moving.

The management of failures is very important in both types of systems. This is even more true in distributed systems where failure can affect several hundred to thousands of computers. Centralized systems enjoy the liberty of being able to restart if needed in the event of a failure.

Restarting a distributed system is not always possible as it may be handling thousands of different processes that depend on the system being up. Deploying a system that utilizes fault protection and recovery can help protect your system in the event of a failure.



Download as:   txt (3.4 Kb)   pdf (56.3 Kb)   docx (8.4 Kb)  
Continue for 2 more pages »
Only available on