Yabi: Bringing Drag-and-drop to Supercomputers
Supercomputers are powerful tools for scientists. They are also very expensive, so wasted time can mean a lot of wasted resources. But making the most efficient use of them is not the easiest proposition in the world; it's not just a case of clicking a button to analyze a protein. However, fitting out the world of supercomputers with a user-friendly, web-based interface is the focus of an open source project based at Western Australia's Murdoch University.
Last year Murdoch publicly launched Yabi, a tool equipped with a web interface to make using supercomputers simpler.
The computational physics community, as an example, may be very proficient in the intricacies of shell scripts and working with a command line, says Professor Matthew Bellgard, Director of Murdoch's Centre for Comparative Genomics. "They've had a lot of experience in the past running their Fortran code using 4000 cores or 10,000 cores," he says. However, "there are other domains where scientists don't necessarily have that skill running command line code or porting their code from one supercomputer to another."
Learning at least a smattering of Perl or some other scripting language is often the norm for life scientists, but it can require a significant investment of time to get up to speed and make a supercomputer do what a scientist wants.
"When we started down this particular path of building Yabi, we wanted to simplify access to supercomputing infrastructure for end users. And the end users typically are non-IT-proficient; consider life science researchers or geoscientists," Professor Bellgard says. "While some of them have the ability to write scripts and use programming languages a lot would prefer to be able to just drag and drop and have access to tools that you could access via the command line, but in a web-based environment. "So I guess our first remit was, 'Can we simplify access to high performance computing (HPC) infrastructure?'"
Yabi has already been used to make life easier for scientists studying metagenomics (genomics is the study of the DNA of living organisms, metagenomics looks at the profile of organisms in a particular sample, for example a soil or sedimentary sample). "Metagenomics is a relatively new area and the tools are just being developed. The tools for data analysis of DNA sequences are readily available, but metagenomics is relatively new."
Professor Bellgard says that there have been marked improvements in productivity thanks to Yabi. "We are able to use the Yabi environment to make available tools in a very accessible way for life science researchers."
The team has made resources available for other research communities, for example scientists studying cattle tick genome. "We've created a bioinformatics resource which behind the scenes uses the Yabi environment," Professor Bellgard says.
Although Yabi comes out of a life science environment, the tool can be used across scientific disciplines. One of its strengths, according to Professor Bellgard, is its flexibility.
"Pretty much any tool that can be run on a command line can actually be incorporated into the Yabi environment. So any command line tool whether it's a statistical tool, whether it's a genomics or a bioinformatics tool, whether it be a remote sensing tool, whether it be an astronomy tool can be incorporated."
And although its focus is on HPC, Yabi utilizes standard tools and protocols so it can be deployed in non-supercomputing environments. The developers are now targeting the cloud; "we are getting Yabi ready for a cloud based deployment which demonstrates that it is a very scalable system," Professor Bellgard says.
"Imagine a workflow, for example, of ten different tools that a user might pull together to conduct analysis on some data. The first tool is selecting a file from a computer that is connected to a remote sensor somewhere in outback Australia, then the next tool is running a file format process that may end up being computationally intensive. The third item is a tool that is on a supercomputer local to the researcher.
"The next tool is remotely accessing another supercomputer, then the fourth or fifth tool is actually utilising the cloud resources and the results are stored in the cloud somewhere. There is no lock in so you get to pick and choose where you run the tools or the administrator does that for the users. You do not have to log in to Yabi in order to get access to your data, so you can choose where the results are stored."
Part of the philosophy driving Yabi is not just simplifying workflow management for end users, but for system administrators too. "You can imagine one scientist comes and says 'please install 10 tools for me into this Yabi environment' but then if 50 scientists come and they all have different tools the last thing you want is the bottle neck to be at the system administration level. We can abstract away the complexities for the user but we also want to abstract away the complexities for the sysadmin."
Yabi was open sourced in July last year. "We were mindful not to open source it too early, because there is a level of support that is required for the international community and we were making sure that the system was in a state that would be able to supported by our group," Professor Bellgard says. It has already been deployed at QFAB the Queensland Facility for Advanced Bioinformatics; and the Murdoch team is "in conversation" with scientists at UTS in Sydney as well as at other universities and CSIRO.