Sweden's National Library begins a new chapter

The library is training state-of-the-art AI models on a half-millennium of Swedish text to support humanities research in history, linguistics, media studies and more.

  • Posted in

Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library — also known as Kungliga biblioteket, or KB — its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It’s a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.

“We can build state-of-the-art AI models for the Swedish language since we have the best data,” said Love Börjeson, director of KBLab, the library’s data lab.

Using NVIDIA DGX systems, the group has developed more than two dozen open-source transformer models, available on Hugging Face. The models, downloaded by up to 200,000 developers per month, enable research at the library and other academic institutions.

“Before our lab was created, researchers couldn’t access a dataset at the library — they’d have to look at a single object at a time,” Börjeson said. “There was a need for the library to create datasets that enabled researchers to conduct quantity-oriented research.”

With this, researchers will soon be able to create hyper-specialized datasets — for example, pulling up every Swedish postcard that depicts a church, every text written in a particular style or every mention of a historical figure across books, newspaper articles and TV broadcasts.

Real-time monitoring and analytics capabilities ensure efficient irrigation and fertigation, reduce nutrient runoff while enabling higher quality citrus nursery trees.
HP kicks off global Amplify Partner Conference with future-ready strategy for driving greater collaboration and benefits.
Non-profit Mozilla Foundation commits funding to build startup dedicated to developing trustworthy AI apps and products.
Dataiku has joined the NVIDIAs DGX-Ready Software program. Dataiku has been selected for the exclusive, invite-only program because of its tested and certified solutions that pair with NVIDIA DGX systems to simplify AI deployment and management, allowing NVIDIA customers and partners to easily implement advanced analytics and AI.
Latest Now Platform release unveils AI-powered process mining with RPA capabilities, search, workforce optimization for HR, and incident management enhancements to address security and operational risk.
International research report by Westcon-Comstor reveals challenges of complex multivendor portfolios, and the need for distributor marketplaces to simplify and add value,
Chelsea and Westminster Hospital NHS Foundation Trust has successfully used virtual reality (VR) technology for the first time to provide immersive learning experiences for staff in equality, diversity and inclusion training.
OpsRamp and the HPE GreenLake edge-to-cloud platform will provide a unified approach to manage multi-vendor computing, networking, storage, and application resources in hybrid and multi-cloud IT environments.