The National Library of Norway has used SAM-FS1 as a system for long-term storage and archiving of large amounts of data since 2007. SAM-FS contains 1

Better Late Than Never: Adding Checksums to 16 Million Legacy Files

submited by
Style Pass
2024-11-07 14:00:04

The National Library of Norway has used SAM-FS1 as a system for long-term storage and archiving of large amounts of data since 2007. SAM-FS contains 14 Petabytes of data and will soon reach “end of life” status as a product.

In 2022, the National Library decided to replace SAM-FS with a more modern preservation solution for digital material. This new solution is based on in-house developed software called DPS (Digital Preservation Services) and uses IBM-HPSS as the underlying system for data storage.

Over the last 10 years, the National Library has used checksums2 as a verification technique for preserved data. In this context, a checksum is a calculated hash string used to verify that a data file has not been subject to any changes. Common checksum calculation algorithms include MD5, SHA-1, SHA-256, or SHA-512. The National Library uses MD53.

Many of the oldest files in SAM-FS lacked checksums when they were stored. As all files in SAM-FS are stored in three copies, you could say that without an accompanying checksum the three copies exist independently of each other. If a discrepancy were to arise between the three copies, we would have no original checksum to use for verification.

Leave a Comment