I thought we were talking at a consumer/prosumer level and cassettes.
The article and tech involved is VERY interesting, but there’s no relation between the issues CERN faces when storing data with the ones we face, specially the hardware available.
Moreover, CERN has been around since 1953, I can imagine the data center was there long before modern digital storage came into place. After a bit of research, It looks SHIFT was developed and operated in the 90s, then CASTOR I (1998 to 2007), CASTOR II (2005 to 2022) then CTA (2020) being the latest sucesor. All these projects are just the continuity from the previous ones, not full replacements:
The CTA Tape Server is an evolution from the CASTOR Tape Server and the underlying data format of files on tape is exactly the same. This means that migration from CASTOR to EOSCTA is a pure metadata operation without physical data movement.
They also use disk storage since 2011 that had 140PB vs 20PB stored on CASTOR (tape) at some point (there wasn’t any date on the paper weirdly).
But there’s this as well:
- As of the end of May 2021, we had 380 PB of data on tapes.
- As of the end of May 2021, 271 PB were stored on disks for a total capacity of 487 PB on disks. However, as disks are less reliable than tapes (a thousand times less, 30 disks fail each week in the data centre), we always copy twice the data on disk so that if a disk fails we do not lose data => so ‘only’ 135 PB of ‘real’ data is on disks at the moment.
It looks indeed that they took a change by gradually increasing the tape storage capacity due to costs (up to x5 less for the same capacity) and reliability agains disks. Also a crucial thing to consider for the disks failure rate is that the disk are taking extremely intensive use:
Within one year, when the LHC is running, more than one exabyte (the equivalent to 1000 petabytes) of data is being accessed (read or written).
The CERN storage system, EOS (disk), was created for the extreme LHC computing requirements. EOS in 2020 has served 2.5 exabyte of physics data to the experiments and at the end of 2019, EOS instances at CERN exceeded five billions files, matching the exceptional performances of the LHC machine and experiments.
That being true, their tape system still doesn’t compare to what we have available for tape write/read/store hardware at consumer level, so my view still holds that disks/cloud are a better option.