Abstract:
Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce
frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on
the data locality property for tasks scheduling. The data locality may cause less physical resources utilization
in non-virtualized clusters and more power consumption. Virtualized clusters provide a viable solution
to support both data locality and better cluster resources utilization. In this paper, we evaluate the major
MapReduce scheduling algorithms such as FIFO, Matchmaking, Delay, and multithreading locality (MTL)
on virtualized infrastructure. Two major factors are used to test the evaluated algorithms: the simulation time
and the energy consumption. The evaluated schedulers are compared, and the results show the superiority
and the preference of the MTL scheduler over the other existing schedulers. Also, we present a comparison
study between virtualized and non-virtualized clusters for MapReduce tasks scheduling.