Big data challenges & opportunities for development using Hadoop 2.0 platform

Hegazi, Abdel Rahman Farag (2014) Big data challenges & opportunities for development using Hadoop 2.0 platform. [Dissertation (University of Nottingham only)]

[thumbnail of AHegazi_dledata_temp_turnitintool_766644993._13264_1413205799_116303.pdf] PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB)

Abstract

Today data analytics has become one of the fast growing research topics in the computation field, thanks to the technological advancements in the past few years that lead to this deluge of data era. Consequently; processing such large volumes of data has become a very complicated job as the data keeps growing continuously with enormous rates. Unfortunately, the traditional data analytics systems cannot process these large volumes of data due to the difficulty of managing the systems resources on these large scaled systems. Enhancing the overall system performance is also another issue as performance keeps degrading while the queue of the tasks waiting to be processed gets longer. The term “bigdata” was coined to describe this trend of fast growing data-sets. These obstacles add extra complex layers to the underlying platforms to perform the data acquisition, transmission, storage management, and these large-scale data processing mechanisms.

In this project we address the possibility of having such a platform to process large data-sets volumes, with fair dynamic resources allocation scheme between all the different jobs running instead of the static allocation schemes that waste resources through unbalanced resources allocation and minimize the resources utilization. In addition, the system overall performance should remain the same if not better with no performance degrading as more running applications asking for additional resources. We present a performance tuning scheme for Apache Hadoop platform, which is an open-source framework to process large-scale data-sets on clustered environment of commodity hardware. The outcome of our work shows there is an overall performance enhancement by 11.7%, and the overall resource utilization is enhanced by 17% in comparison with the un-tuned and older tradition data analytics systems.

Item Type: Dissertation (University of Nottingham only)
Keywords: Apache Hadoop, MapReduce, YARN, Dynamic Allocation, Big Data, Cluster, Data-sets. Data Warehouse, Resource Allocation, Data Analysis, Cluster, Cloud Computing
Depositing User: Gonzalez-Orbegoso, Mrs Carolina
Date Deposited: 13 Nov 2015 09:40
Last Modified: 19 Oct 2017 15:04
URI: https://eprints.nottingham.ac.uk/id/eprint/30755

Actions (Archive Staff Only)

Edit View Edit View