CIPHERER

Case Study

CERN and GAIA, via STFC: data infrastructure for particle physics and a billion stars

Client Science and Technology Facilities Council (STFC), supporting CERN and ESA's GAIA mission
Mission Keep mission-critical scientific compute running to the cadence the science requires, across decade-long programmes.
~90 PB/yr to a ~1 EB tape archive

Stakes

Two of the most demanding scientific programmes in the world depend on the data behind them. At CERN, the Large Hadron Collider generates raw collision data on the order of a petabyte per second across its four main experiments at peak luminosity; hardware and software triggers discard ~99.99% in real time, leaving roughly 1 PB/day written to permanent storage and ~90 PB/year added to a tape archive that now totals around 1 exabyte. The European Space Agency's GAIA spacecraft is mapping the position, motion and properties of more than a billion stars. Both programmes require data infrastructure that simply cannot fail. A pipeline outage is not a service ticket; it is lost science. Capacity has to be planned for missions that run for a decade or more.

Constraints

  • Mission lifecycles measured in decades, not quarters
  • Heterogeneous compute estate: Linux and Windows servers across scientific workloads
  • Secure network infrastructure for research communications: firewalls, switches, VoIP
  • Operating within UK government scientific computing standards (STFC, Royal Observatory Edinburgh)
  • Capacity planning under uncertainty: scientific demand grows in ways business demand does not

Approach

Treat infrastructure as a long-lifecycle asset

Scientific computing rewards infrastructure designed for ten- and twenty-year horizons. We approached compute, storage and network as long-lifecycle assets, with clear hardware refresh cadences, capacity buffers and operational documentation that would survive multiple staff generations.

Engineer for known and unknown demand

Some workloads (LHC analysis, GAIA mission processing) had predictable shapes. Others would emerge as scientists found new questions to ask. We sized for both: deterministic provisioning for the known, headroom and elasticity for the unknown.

Operate to research-grade reliability

Mission-critical scientific compute does not get a maintenance window during an LHC run. We designed for in-place operations, partial failure tolerance and rapid recovery. Every change was reviewed against the mission calendar.

Document for mission longevity

Configuration documentation, operational procedures and capacity baselines were treated as deliverables, not by-products. The next engineer through the door, in five or ten years, needed to inherit a runnable system.

Deliverables

  • Designed and operated 20+ mission-critical Linux and Windows servers at the Royal Observatory Edinburgh
  • Maintained secure network infrastructure including firewalls, switches and VoIP for research communications
  • Capacity planning and lifecycle management for scientific compute and storage hardware
  • Operational procedures and configuration documentation for long-lifecycle handover
  • Disaster recovery and business continuity validation across scientific workloads

Outcome

Reliable data pipelines fed two of the most demanding scientific programmes on Earth: CERN's particle physics analysis and ESA's GAIA stellar census. The infrastructure ran to the cadence the science required, not the cadence enterprise IT defaults to. The work shaped how Cipherer approaches every mission-critical engagement since: design for the longest reasonable lifetime, document for the next engineer, and treat reliability as a scientific instrument, not a service-level agreement.

Stack

  • Linux
  • Windows Server
  • Hardware lifecycle management
  • Firewall and switch infrastructure
  • VoIP

Compliance posture

  • STFC scientific computing standards
  • Royal Observatory Edinburgh operational standards