Petabyte-scale prototype for data archiving

Arkivum now building an EC-co-funded prototype for petabyte-scale archiving and preservation of valuable research data, determining how best to ensure its long-term integrity and accessibility, and so further the cause of future scientific discovery.

Tuesday, 16th February 2021 Posted 5 years ago in BC/DR Storage + Servers Storage Technology by Phil Alsop

Arkivum has embarked on the second, prototyping phase of the EUR 4.8m ARCHIVER project, which was launched in June 2020 by a multinational scientific buyer group led by CERN, operator of the Large Hadron Collider near Geneva.

Arkivum, in partnership with Google Cloud, successfully completed the initial design phase of the three-year project in October 2020. The eight-month prototype phase of ARCHIVER was officially announced in December.

The aim of ARCHIVER (Archiving and Preservation for Research Environments) www.archiver-project.eu, is to achieve radically improved archiving and digital preservation for petabyte-scale data-intensive research. Supporting the IT requirements of European scientists, ARCHIVER will provide end-to-end archival and preservation services for the vast and ever-growing datasets generated by world-leading research institutions.

Embracing such issues as extreme data-scaling, network connectivity, service interoperability and business models, the multidisciplinary project is leveraging best practice and economies of scale to produce solutions for the European Open Science Cloud (EOSC) and other resources.

In addition to CERN, the members of the ARCHIVER buyer group are DESY (the Deutsches Elektronen-Synchrotron, based in Hamburg and Berlin), EMBL-EBI (European Bioinformatics Institute, based in Cambridge), and PIC (Port d’Informació Científica, situated near Barcelona). The ARCHIVER project receives European Commission funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement No 824516).

■ The prototyping mission

Arkivum and Google Cloud is prototyping a fully hosted, user-friendly SaaS solution that can be fully automated and integrated with the working environments of ARCHIVER’s four buyer organisations, each active in a different area of scientific research.

The scalability of the solution is a priority, since in the prototype phase of the project the buyer organisations will ingest petabytes of data at rates of up to 100 terabytes a day. (A petabyte equates to one million gigabytes.) In the context of such huge volumes of data, it is imperative for the prototype to be both highly scalable and highly cost-effective. Arkivum will therefore use a factory-like model of efficiency, automation and high throughput to develop ways of industrializing digital preservation and archiving. The result will enable organisations to ensure their data is Findable, Accessible, Interoperable and Reusable (FAIR) at an unprecedented scale.

In 2020, the response to COVID-19 has highlighted the importance of reliable repositories of research data in the field of life sciences: the development of vaccines has been facilitated and accelerated by the availability of internationally shared research data that has been productively repurposed. With all sectors of scientific research generating ever-greater volumes of data that has a long-term value, good practice and efficiency in archiving and digital preservation has risen up the agenda. The ARCHIVER project, looking to the future of scientific discovery, is therefore setting an example to the world.

■ Arkivum’s view on the prototype: scalability is key

Matthew Addis, Chief Technology Officer and Co-Founder of Arkivum has been leading on the ARCHIVER project. “It means a great deal to all of us at Arkivum to be chosen to build a prototype for ARCHIVER,” he says. “Scalability, performance and cost-effectiveness are clearly some of the key challenges for the ARCHIVER project. To address these challenges, in partnership with Google Cloud we are creating a new ‘cloud factory’ approach for the archiving, digital preservation and access of huge research datasets. We are looking to drive efficiency and reduce costs – to make preservation and archiving viable and affordable at an immense scale. Ultimately, this will mean that research organisations will be able to use our solution as part of their Trusted Digital Repositories to ensure the long-term availability, integrity and reusabiity of their research data for decades to come".

Chris Sigley, Arkivum’s Chief Executive Officer, said: “Beyond affirming the credentials Arkivum has gained in digital archiving and preservation across several industry sectors, the initial design phase of the project gave us an opportunity to demonstrate our customer focus, our responsiveness, and our flexibility when it came to adding a feature or functionality. It is a great privilege – and very exciting – for Arkivum’s team to be working on this ambitious project with four world-leading research organisations. ARCHIVER is setting major precedents for large-scale digital archiving and preservation, in the field of scientific discovery and more broadly in the rapidly evolving discipline of data management across every sector.”

Petabyte-scale prototype for data archiving

Arkivum now building an EC-co-funded prototype for petabyte-scale archiving and preservation of valuable research data, determining how best to ensure its long-term integrity and accessibility, and so further the cause of future scientific discovery.

Seagate ramps 20TB HDD shipments

Clearlake Capital to acquire Quest Software

Infinidat reinforces commitment to the Channel

Panasas joins Thales Accelerate Partner Network

CTERA supports veterans' storage

NCC Group launches Replicate & Recover

virtualDCS named first UK-based Veeam Cloud & Service Provider

Quest releases SharePlex 10.1.2