Expansion in Imperial College London’s Research Computing Services over the past decade had resulted in over 30 separate and independently managed islands of storage – silos that were difficult to access, manage and use. The group needed to enable researchers to access data with ease and speed, so they remain solely focused on their research projects - while integrating legacy and future compute systems and managing data throughout its life-cycle in line with stringent regulatory compliance guidelines.
The new Research Data Store (RDS) is built around ArcaStream PixStor, a high-performance scalable storage platform based on IBM Spectrum Scale parallel file system, which combines flash, disk, tape, and cloud storage into a single global name space. The infrastructure is geographically dispersed with a 5PB storage repository at a primary site and a secondary site for disaster recovery, served by PixStor with asynchronous replication and intelligent tiering to external storage targets. RDS also includes Excelero’s NVMesh®, software that enables the sharing of NVMe Flash storage resources across any network and supports any local or distributed file system. NVMesh provides a scalable NVMe tier for extreme metadata performance. Users benefit from the performance of local flash with the convenience of centralised storage while reducing the overall storage total cost of ownership.
With this new robust infrastructure, Imperial College London’s RDS now simultaneously serves 2,000 existing HPC nodes and over 3,000 users seamlessly, with 20GB/s of throughput with no loss of interactive user performance.
“The usability of the systems for interactive use in particular has improved significantly,” explains Matthew Harvey, RDS project lead and RCS Manager at Imperial College. “Previously, there were frequent interruptions to interactive use because the file system load for some compute jobs effectively squeezed out users. Users would log into the system, type in their search criteria but it could take more than 10 seconds to respond. Now that is a thing of the past.”
The new RDS supports a charge-back strategy where researchers cost-out storage as services on their grants – instead of charging users based on reserve capacity. More effective management of storage capacity also allowed the College to avoid costly additions. The ArcaStream platform provides tools and insight needed to understand the access patterns of data on the file system for each project allocation. “This information governance is enabling us to store valuable data more intelligently and economically,” Imperial College London’s Matthew Harvey continued.