Posted: December 2nd, 2013
The Amazon site experienced an outage, which ended up affecting other websites and servers. The problem began when the services at Amazon’s elastic block store service were interrupted. The main problem was a memory leak in the system. The monitoring system also failed, leading to the outage. The technicians had replaced the data collection server. The failure of the monitoring system enhanced the memory leak, leading it to get out of control (Williams, 2012). The replaced server did not work with the system well because it failed to transmit its domain name system in the right way, and this meant that the message did not reach some servers. The elastic block store provides much needed services to the elastic compute cloud, because it provides it with storage space. The elastic block store is one of the components in elastic computer cloud.
The elastic compute cloud is one of the most important services because it provides the computing and networking bandwidth for websites and web applications. Therefore, interruptions of this service caused major interruptions and outages on the websites relying on the cloud computing offered by elastic compute cloud. Some of the other websites that were affected because of this outage included Reddit, Heroku, Imgur, Minecraft, HipChat, and foursquare among others. Customers to these sites could not access them. The systems could not perform the customers’ request. Initially, the company’s technicians were able to solve the problem in some areas, and the outage lasted for a few hours in most areas. However, the technicians were not able to resolve the problem, and the outages lasted for several more hours in many areas (Hutchinson, 2012).
The problem had caused a lot of customer dissatisfaction after the outage. The company was able to find the main cause of the problem and they were able to resolve it, but not before affecting many customers in the process. The company initiated some measures, which it hoped would prevent such problems from happening in future. Among the measures that the company took, include developing a monitoring system that will alert the system when there is such a problem. The new system will sound the alarm, whenever there is a memory leak in the system. The company had to find ways of solving the problem with the memory leak. This process took some time, and some servers were down for more than three days, before the technicians could resolve the problem (Williams, 2012)
One has to know the root cause of the problem when dealing with web outages. Some technicians fail at identifying the main cause and they end up taking a longer time to deal with the problem. Websites offer many valuable functions for companies. Companies have to find ways of preventing web outages, detecting them when they are about to occur, and dealing with the outages whenever they occur. One way to prevent web outages is to use an online processing system that operates in real time. This will reduce the chances caused by improper transmission within the system, and the server will get all the information (Gelinas, et al., 2011). Companies can depend on the customers input to detect whether there are any problems with the system. They can provide the needed space where the customers report bad experiences with the server. Most of the problems that the customers complain about may seem minor, but reporting them and looking into them earlier will prevent a small problem from escalating. Companies need to use flexible technologies, which will allow for any changes within the system, and this will solve any future problems if the company decides to increase the amount of input data or to increasing processing (Croll & Power, 2009). Other measures include encouraging customers to use web caches and reducing the server load. Looking for the less obvious problems, such as the functioning of the domain name systems is a sure way of preventing possible outages (Safe Resolve, 2011)
Croll, A., & Power, S. (2009). Complete web monitoring: Watching your visitors, performance, communities, and competitors. Sebastopol, CA: O’Reilly Media, Inc
Gelinas, J. U., Dull, R. B., & Wheeler, R. P. (2011). Accounting information systems. New York, NY: Cengage Learning
Gibson, D. (2010). Managing risk in information systems. Sudbury, MD: Jones & Bartlett Publishers
Hutchinson, L. (2012). Amazon web services outage once again shows reality behind “the cloud”. Retrieved from http://arstechnica.com/information-technology/2012/10/amazon-web-services-outage-once-again-shows-reality-behind-the-cloud/
Safe Resolve (2011). Prevent internet outages. Retrieved from http://help.saferesolve.com/index.html?internet_outages.htm
Williams, A. (2012). Amazon web services outage caused by memory leak and failure in monitoring alarm. Retrieved from http://techcrunch.com/2012/10/27/amazon-web-services-outage-caused-by-memory-leak-and-failure-in-monitoring-alarm/
Place an order in 3 easy steps. Takes less than 5 mins.