Big data processing meets non-volatile memory: opportunities and challenges

Big data processing meets non-volatile memory: opportunities and challenges

Wednesday, June 20
4:00 PM - 4:40 PM
Meeting Room 230C

Advanced big data processing frameworks have been proposed to harness the fast data transmission capability of remote direct memory access (RDMA) over InfiniBand and RoCE. However, with the introduction of the non-volatile memory (NVM), these designs along with the default execution models, like MapReduce and Directed Acyclic Graph (DAG), need to be re-assessed to discover the possibilities of further enhanced performance.

In this context, we propose an accelerated execution framework (NVMD) for MapReduce and DAG that leverages the benefits of NVM and RDMA. NVMD introduces novel features for MapReduce and DAG, such as a hybrid push and pull shuffle mechanism and dynamic adaptation to the network congestion. The design has been incorporated into Apache Hadoop and Tez. Performance results illustrate that NVMD can achieve up to 3.65x and 3.18x improvement for Hadoop and Tez, respectively. In this talk, we will also present NVM-aware HDFS design and its benefits for MapReduce, Spark, and HBase.


Shashank Gugnani
PhD Student
The Ohio State University
Shashank Gugnani is a Ph.D. Student in the Department of Computer Science and Engineering at The Ohio State University. His research is focussed on designing high-performance storage systems for cloud middleware.