“The challenge of managing varied research needs is accommodating both very large parallel I/O jobs and millions of small, random read requests without imposing performance penalties on anyone,” said Mike Shuey, research infrastructure architect at Purdue University. “With DDN’s scalable storage platform and SFX technology, we can sustain the highest levels of performance for all researchers by supporting all types of workloads at the same time.”
To meet its multidisciplinary research demands, Purdue sought a powerful yet flexible storage solution that could keep pace with traditional big data volumes generated by top research areas, including computational nanotechnologies, aeronautical and astronomical engineering, mechanical engineering, genomics and structural biology. Additionally, the university needed to keep pace with emerging requirements that were causing an exponential surge in data volume, velocity and variety. For example, Purdue’s College of Agriculture recently teamed with the School of Mechanical Engineering to use sensor-equipped unmanned aircraft to collect critical data from acres of fields. New research outside the typical HPC realm also needed to be accommodated in the data repository, such as new projects from the College of Liberal Arts and Department of Sociology.
To best address its diverse set of stakeholders, Purdue deployed a pair of DDN SFA12KX storage systems with SFX and 6.4 PBs of raw capacity for the University’s GPFS parallel file system. To ensure predictable, fast access to the Data Depot, Purdue also deployed DDN SFX Software to extend the storage cache with solid-state memory. As a result, the system loads the right data into flash storage at the right time to maximize cache hit rates and deliver a fast response.
By pre-loading data into solid-state storage, Purdue has been able to realize the performance benefits of flash storage for handling big data sets at a price point that’s closer to lower-cost, high-density hard disk drives. “DDN’s SFX delivered a 900 percent improvement in read capability at a low cost while enabling us to access millions of small files on dedicated solid-state modules while continuing to stream very large data files simultaneously,” Shuey continued. “Simple data queries that used to take two minutes now take two seconds.”