CERN modernizes IT infrastructure with OpenStack and Puppet
CERN is making the infrastructure that handles the data from the Large Hadron Collider (LHC) more flexible by upgrading it with OpenStack for virtualization and Puppet for configuration management.
The research organization’s objective is to change how it provides services to scientists working at the LHC, which runs in a 27-kilometer circular tunnel about 100 meters beneath the Swiss and French border at Geneva.
“One of the things we have to contend with is how to scale our infrastructure fairly significantly with a fixed staff and fixed costs. With a fixed budget you can buy more and more equipment, but you can’t provide more and more services with the same number of people,” said Ian Bird, LHC computing grid project leader.
But that may be possible if you change the way things are done. CERN’s goal is to become more efficient by moving in the direction of infrastructure-as-a-service and platform-as-a-service with a private cloud. The goal is to be able to more dynamically change how the infrastructure is used. Right now the accelerator is shut down so the CERN data center has a different workload from last year when the LHC was running, according to Bird.
“Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services,” Bird said.
CERN chose OpenStack because it seems to be the platform with the most traction behind it. OpenStack’s popularity also makes it attractive from a staffing point of view, according to Bird.
“We have a transient staff, because not everybody has permanent contracts. So it’s good to have people that come in with that expertise or can leave with it, and then sell it somewhere else,” Bird said.
CERN is also moving away from the custom in-house software that manages the cluster itself to software like Puppet.
“When we started scaling up the cluster for LHC, the large scale Googles and Amazons didn’t really exist. So we invested quite a lot of effort in configuration management and monitoring, but a couple of years ago we decided to instead go with something that had a larger support community,” Bird said.
CERN looked at Chef and Puppet, and chose the latter as it worked in a way that was closer to its own management model. The rollout of Puppet and OpenStack are both underway.
Today CERN’s infrastructure is distributed across about 160 data centers of different sizes located around the world.
“The reason behind that is twofold; one is given the size of the data center we have here there is no way we could have done all the computing for the LHC, and the other is political and sociological. We are given money to do computing, but it is preferred that the funding stays where it is coming from,” said Bird.
CERN’s own data center and a recently announced data center in Budapest is tier 0, and the next tier is made up of 11 data centers that are typically located at large national labs, such as the FermiLab in the U.S., according to Bird. The last tier mostly consists of computing resources at universities.
To make OpenStack a better fit for CERN’s distributed computing resources, the organization will collaborate with the community on data center federation.
“If we at CERN are running OpenStack and other of our grid centers are also running OpenStack we would like to federate the cloud parts ... So if you have your credentials at CERN, you ought to be able to let your work migrate to FermiLab, for example,” Bird said.
Storage is a very important part of what CERN does, and the demands are huge. The two big detectors—CMS and ATLAS—at the LHC produce about 1 petabyte or 1,000 terabytes of data per second. The detectors track the motion and measure the energy and charge of particles thrown out in all directions after a collision in the accelerator. That data is then whittled down to a few hundred megabytes per second of the most interesting events by a farm of Linux machines with 15,000 processing cores located at each detector.
Still, in 2012 about 30PB of data from the LHC was saved. The data is cached on disk, but then archived on tape. The archive stores about 100PB of data, of which about 70PB comes from the accelerator, according to Bird, who calls the archiving “a non-trivial exercise.”
Bird is a big fan of tape storage for three main reasons: cost, error rates and power consumption.
Tape is still a factor of 10 cheaper than the equivalent space on disk. Hosted storage services such as Glacier from Amazon Web Services are much too expensive, Bird said. And the error rate on tapes is extremely low compared to the failure of disks, he said.
It’s also important to keep down power consumption, which is a limiting factor in today’s data centers. The data center in Budapest was added not because CERN ran out of space, but because it ran out of power. The tape robots use very little power compared to disks, according to Bird.
“Tape is quite significantly underrated. Probably for the last 15 years people have been saying that it is dead, and will be replaced by disk. But it hasn’t gone away, and I don’t see it going away any time soon. For large archives you can’t really compete,” Bird said.
But tape has to be managed well for it to work.
“You can’t just put it on tape and leave it for 20 years. Tape media changes every two or three years, so we are continually reading it from one generation and copying it to the next generation. We also read it actively to make sure it is still readable,” Bird said.