Petabyte-scale prototype for data archiving

Arkivum now building an EC-co-funded prototype for petabyte-scale archiving and preservation of valuable research data, determining how best to ensure its long-term integrity and accessibility, and so further the cause of future scientific discovery.

  • 3 years ago Posted in

Arkivum has embarked on the second, prototyping phase of the EUR 4.8m ARCHIVER project, which was launched in June 2020 by a multinational scientific buyer group led by CERN, operator of the Large Hadron Collider near Geneva.

 

Arkivum, in partnership with Google Cloud, successfully completed the initial design phase of the three-year project in October 2020.  The eight-month prototype phase of ARCHIVER was officially announced in December.

 

The aim of ARCHIVER (Archiving and Preservation for Research Environments) www.archiver-project.eu, is to achieve radically improved archiving and digital preservation for petabyte-scale data-intensive research. Supporting the IT requirements of European scientists, ARCHIVER will provide end-to-end archival and preservation services for the vast and ever-growing datasets generated by world-leading research institutions.

Embracing such issues as extreme data-scaling, network connectivity, service interoperability and business models, the multidisciplinary project is leveraging best practice and economies of scale to produce solutions for the European Open Science Cloud (EOSC) and other resources.

 

In addition to CERN, the members of the ARCHIVER buyer group are DESY (the Deutsches Elektronen-Synchrotron, based in Hamburg and Berlin), EMBL-EBI (European Bioinformatics Institute, based in Cambridge), and PIC (Port d’Informació Científica, situated near Barcelona). The ARCHIVER project receives European Commission funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 824516).

 

■  The prototyping mission

Arkivum and Google Cloud is prototyping a fully hosted, user-friendly SaaS solution that can be fully automated and integrated with the working environments of ARCHIVER’s four buyer organisations, each active in a different area of scientific research.

 

The scalability of the solution is a priority, since in the prototype phase of the project the buyer organisations will ingest petabytes of data at rates of up to 100 terabytes a day. (A petabyte equates to one million gigabytes.) In the context of such huge volumes of data, it is imperative for the prototype to be both highly scalable and highly cost-effective.  Arkivum will therefore use a factory-like model of efficiency, automation and high throughput to develop ways of industrializing digital preservation and archiving.  The result will enable organisations to ensure their data is Findable, Accessible, Interoperable and Reusable (FAIR) at an unprecedented scale.

 

In 2020, the response to COVID-19 has highlighted the importance of reliable repositories of research data in the field of life sciences: the development of vaccines has been facilitated and accelerated by the availability of internationally shared research data that has been productively repurposed. With all sectors of scientific research generating ever-greater volumes of data that has a long-term value, good practice and efficiency in archiving and digital preservation has risen up the agenda. The ARCHIVER project, looking to the future of scientific discovery, is therefore setting an example to the world.

 

■  Arkivum’s view on the prototype: scalability is key

Matthew Addis, Chief Technology Officer and Co-Founder of Arkivum has been leading on the ARCHIVER project. “It means a great deal to all of us at Arkivum to be chosen to build a prototype for ARCHIVER,” he says. “Scalability, performance and cost-effectiveness are clearly some of the key challenges for the ARCHIVER project.  To address these challenges, in partnership with Google Cloud we are creating a new ‘cloud factory’ approach for the archiving, digital preservation and access of huge research datasets.  We are looking to drive efficiency and reduce costs – to make preservation and archiving viable and affordable at an immense scale. Ultimately, this will mean that research organisations will be able to use our solution as part of their Trusted Digital Repositories to ensure the long-term availability, integrity and reusabiity of their research data for decades to come".

 

Chris Sigley, Arkivum’s Chief Executive Officer, said: “Beyond affirming the credentials Arkivum has gained in digital archiving and preservation across several industry sectors, the initial design phase of the project gave us an opportunity to demonstrate our customer focus, our responsiveness, and our flexibility when it came to adding a feature or functionality. It is a great privilege – and very exciting – for Arkivum’s team to be working on this ambitious project with four world-leading research organisations. ARCHIVER is setting major precedents for large-scale digital archiving and preservation, in the field of scientific discovery and more broadly in the rapidly evolving discipline of data management across every sector.”

Exos X20 and IronWolf Pro 20TB CMR-based HDDs help organizations maximize the value of data.
Quest Software has signed a definitive agreement with Clearlake Capital Group, L.P. (together with...
Infinidat has achieved significant milestones in an aggressive expansion of its channel...
Collaboration will safeguard HPC storage systems and customer data with Panasas hardware-based...
Peraton, a leading mission capability integrator and transformative enterprise IT provider, has...
Helping customers plan for software failure, data loss and downtime.
Cloud Computing and Disaster Recovery specialist, virtualDCS has been named as the first UK-based...
SharePlex 10.1.2 enables customers to move data in near real-time to MySQL and PostgreSQL.