SLAC expands and centralizes computing infrastructure to prepare for future data challenges

The computing facility at the Department of Energy’s SLAC National Accelerator Laboratory is doubling in size, preparing the laboratory for new scientific endeavors that promise to revolutionize our understanding of the world from atomic to cosmic scales but also require dealing with unprecedented data flows.

When SLAC’s superconducting X-ray laser, for example, comes online, it will eventually collect data at an amazing rate of terabytes per second. The world’s largest digital camera for astronomy, under construction in the laboratory of the Vera C. Rubin Observatory, will capture 20 terabytes of data each night.

“The new computing infrastructure will meet these challenges and more,” said Amedeo Pirazo, who leads the Controls and Data Systems division within the Laboratory’s Directorate of Technology Innovation. “We are adopting some of the latest and greatest technologies to create computing capabilities for all SLACs for years to come.”

The Stanford-led construction adds a second building to the existing Stanford Research Computing Facility (SRCF). SLAC will become a principal tenant of the SRCF-II – a modern data center that will provide an environment designed to operate 24/7 without interruption of service and with data integrity in mind. The SRCF-II will double existing data center capabilities, with a total of 6 megawatts of power capacity.

“Computing is the core competency of a science-driven organization like SLAC,” said Adeyemi Adesanya, Head of Scientific Computing Systems Division, Perazzo Division. “I am pleased to see our vision of an integrated computing facility realized. It is a necessity for data analysis at huge scales, and it will also pave the way for new initiatives.”

SLAC’s Big Data Center

Adesanya’s team is preparing to prepare the hardware for the SLAC Shared Science Data Facility (S3DF), which will find its home inside SRCF-II. It will become the computing center for all the data-intensive experiments performed in the laboratory.

First and foremost, it will benefit future users of the LCLS-II, an upgraded Linac Coherent Light Source (LCLS) X-ray laser that will produce more than 8000 pulses per second than the first generation machine. Researchers hope to use LCLS-II to gain new insights into atomic processes that are central to some of the most pressing challenges of our time, including the chemistry of clean energy technologies, the molecular design of drugs, and the development of quantum materials and devices.

The new capabilities come with challenging computational challenges, said Jana Thayer, head of LCLS Data Systems. “To get the best scientific results and make the most of their time on LCLS-II, users will need quick feedback – within minutes – on the quality of their data,” she said. “To do this with an X-ray laser that produces thousands of times more data every second than its predecessor, we need petaflops of computing power that S3DF will provide.”

Another problem that researchers will have to deal with is the fact that LCLS-II will collect a lot of data to store it all. The new data facility will power an innovative data reduction pipeline that dumps unnecessary data before saving it for analysis.

Another technology with computational needs that will take advantage of the new infrastructure is cryo-electron microscopy (cryo-EM) of biomolecules, such as proteins, RNA or virus particles. In this method, scientists take pictures of how a beam of electrons interacts with a sample containing biomolecules. Sometimes they need to analyze millions of images to reconstruct the 3D molecular structure in near-atomic detail. The researchers also hope to visualize the molecular components in cells, not just chemically purified molecules, with high resolution in the future.

Complex image reconstruction requires a lot of CPU and GPU power and involves complex machine learning algorithms. Performing these calculations in S3DF will provide new opportunities, said Wah Chiu, president of the Stanford-SLAC Cryo-EM Center.

“I really hope S3DF will become a computing think tank, where experts come together to write code that allows us to visualize increasingly complex biological systems,” Chiu said. “There is a lot of potential for discovering new structural states of molecules and organelles in normal and diseased cells in SLAC.”

In fact, everyone in the laboratory will be able to use the available computing resources. Other potential “clients” include the SLAC Ultrafast Electron Deflection Instrument (MeV-UED), the Stanford Synchrotron Radiation Lightsource (SSRL), the Laboratory-wide Machine Learning Initiative and Applications in Accelerators. Overall, S3DF will be able to support 80% of SLAC’s computing needs, while 20% of the most demanding scientific computing will be performed on offsite supercomputer facilities.

Multiple services under one roof

SRCF-II will host two other major data facilities.

One is the Robin Observatory’s US data facility (USDF). In a few years, the observatory will begin capturing images of the southern night sky from a mountaintop in Chile with its 3,200-megapixel SLAC camera. For the Legacy Survey of Space and Time (LSST), it will take two images every 37 seconds for 10 years. The resulting information may contain answers to some of the biggest questions about our universe, including what exactly is speeding up the expansion of the universe, but that information will be included in a 60 petabyte catalog that researchers will have to scrutinize. The resulting image archive will be around 300 petabytes, which dominates storage use in SRCF-II. USDF, with two other centers in the UK and France, will handle the production of the massive data catalog.

The third data center will serve the community of users of the first generation of SLAC X-ray lasers. The current computing infrastructure for LCLS data analysis will gradually transition to SRCF-II and become a much larger system there.

Although every data center has specific needs in terms of technical specifications, they all depend on a basic set of shared services: data always needs to be transmitted, stored, analyzed and managed. Working closely with Stanford, Robin Observatory, LCLS and other partners, the Perazzo and Adesanya teams are setting up the three systems.

For Adesanya, this unified approach — incorporating a cost model that will help drive future upgrades and growth — is a dream come true. “Historically, computing at SLAC was highly distributed and each facility had its own, specialized system,” he said. “The new, more centralized approach will help catalyze new initiatives at the lab level, such as machine learning, and by deconstructing silos and convergence into an integrated data facility, we are building something more capable than the sum of everything we were doing before.”

The construction of the SRCF-II is a Stanford project. Significant portions of S3DF infrastructure are funded by the Department of Energy’s Office of Science. LCLS and SSRL are Office of Science User Facilities. Rubin Observatory is a joint initiative of the National Science Foundation (NSF) and the Office of Science. Its primary mission is to conduct a space-time heritage survey, providing an unprecedented data set for scientific research with the support of both agencies. Rubin is jointly operated by NOIRLab and SLAC of NSF. NOIRLab is managed for NSF by the Consortium of Universities for Research in Astronomy and SLAC is operated for DOE by Stanford. The Stanford-SLAC Cryo-EM (S2C2) Center is supported by the High Resolution Electron Microscopy Program of the National Institutes of Health (NIH).

SLAC is a vibrant multi-program laboratory that explores how the universe operates at the largest, smallest, and fastest scales and creates powerful tools used by scientists around the world. Through research that includes particle physics, astrophysics, cosmology, materials, chemistry, biosciences, energy, and scientific computing, we help solve real-world problems and advance the interests of the nation.

SLAC is operated by Stanford University for the US Department of Energy science office. The Office of Science is the largest supporter of basic research in the physical sciences in the United States and works to address some of the most pressing challenges of our time.

Disclaimer: AAAS and EurekAlert! Not responsible for the accuracy of newsletters sent to EurekAlert! Through the contributing institutions or for the use of any information through the EurekAlert system.

#SLAC #expands #centralizes #computing #infrastructure #prepare #future #data #challenges

Leave a Comment

Your email address will not be published.