Recent DaMRL Projects


EVOLVING FLASH-BASED STORAGE

Abstract:

Flash-based solid-state drives (SSDs) are widely used to accelerate different applications because of their superior overall performance compared to hard-disk drives (HDDs). To achieve better performance with SSDs, the storage stack overhead imposed by the operating system (OS), rather than device speed, is now the bottleneck that must be addressed as a key research priority. It is critical to evolve new techniques to take full advantage of the unique characteristics of flash memory and flash-based persistent storage. However, our existing OS cannot take advantage of such techniques as it is designed in a very generic fashion to support the broad class of the storage devices. There is thus a critical need to rethink our system infrastructure to take advantage of the best and potentially unique aspects of flash-based memory and NVMe SSDs as persistent storage. The primary objective of our research is to design new system infrastructures, that take advantage of the unique flash characteristics exposed by new storage devices, for accelerating various applications.

Publications:

  1. Janki Bhimani, Tirthak Patel, Ningfang Mi, and Devesh Tiwari, What does Vibration do to YourSSD?, 2019 Design Automation Conference (DAC19), Las Vegas, NV, 2019. Acceptance Rate:24.3%.
  2. Mahsa Bayati, Janki Bhimani, Ronald Lee, Ningfang Mi. Exploring Benefits of NVMe SSDs for Big Data Processing in Enterprise Data Centers International Conference on Big Data Computing and Communication (BIGCOM19), Qingdao, China, 2019.

Public Software:

  1. https://github.com/bhimanijanki/SSD_Vibration

Acknowledgments:

NSF


DATACENTER SCHEDULING AND RESOURCE MANAGEMENT

Abstract:

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. High-performance and scalable frameworks have become the need of the hour for data-intensive processing and analytics in both industry and academia. More and more applications are using the new parallel-data computing techniques used as TensorFlow, and Apache Spark. It is an interesting research problem to maximize resource utilization and minimize big data processing time. However, given the limited resources in the cluster and a complex dependency in data flow, it is challenging to design scheduling and resource management techniques. Therefore, the primary focus of our research is to put significant efforts into developing new schemes for job scheduling and resource management for evolving parallel-data computing frameworks and applications.

Publications:

  1. Danlin Jia, Janki Bhimani, Son Nam Nguyen, Bo Sheng, and Ningfang Mi, ATuMm: Auto-tuning Memory Manager in Apache Spark, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%.

Public Software:

  1. https://github.com/DanlinJia/spark_core_ATMM

Acknowledgments:

NSF


I/O BEHAVIOR MODELING & PERSISTENT STORAGE DEVICE CONFIGURATION

Abstract:

This project makes empirical contributions to storage systems by addressing challenges issued by large-scale data-intensive applications. Specifically, it advances (1) how to analyze the impact of various system components while running multiple workloads on emerging storage systems; (2) how to design interactive frameworks that allow users to modify the internal algorithms and parameters of modern storage devices; (3) how to enable novices to configure storage systems with respect to their workloads and data processing requirements; and (4) how to derive I/O models to predict future I/O workload patterns and accordingly configure storage systems in advance for better performance.

This project will allow designing better storage systems with high performance and reliability. The outcome of this project will bring a significant impact on many areas that are dependent on processing a large amount of data. This project will share the findings with undergraduate and graduate students through computer science and engineering programs and open up career opportunities to female students, underrepresented minorities, and first-generation college students. This project will disseminate the proposed techniques into the industry and foster technology transfer through new industrial collaborations. The developed infrastructure will be available to the research community through a web-based portal.

Publications:

  1. Janki Bhimani, Ningfang Mi, Miriam Leeser, and Zhengyu Yang, New Performance Modeling Methods for Parallel Data Processing Applications, ACM Transactions on Modeling and computer simulation (TOMACS), 2019. DOI 10.1145/3309684.
  2. Janki Bhimani, Rajinikanth Pandurangan, Ningfang Mi, and Vijay Balakrishnan, Emulate Processing of Assorted Database Server Applications on Flash-Based Storage in Datacenter Infrastructures, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%

Public Software:

  1. https://github.com/bhimanijanki/ms_ssds_sim
  2. https://github.com/bhimanijanki/FiM
  3. https://github.com/bhimanijanki/KMeans

Acknowledgments:

NSF