The R programming language is quickly gaining popular ground against the traditional statistics packages such as SPSS, SAS and MATLAB, at least according to one data statistician who teaches the language.
“It is very likely that during the summer of 2014, R became the most widely used analytics software for scholarly articles, ending a spectacular 16-year run by SPSS,” wrote Robert Muenchen, in a blog post summarizing his analysis.
Muenchen gauged the popularity of statistical software packages by tracking how often they have been used for published scientific research and the number of mentions they get in online discussion forums, blogs, job listings and other sources.
Scholarly citations are a “good leading indicator of where things are headed,” Muenchen wrote. Students who learn to use these software packages later go on to use them in their professional careers, either in academia or industry.
In his latest survey, Muenchen found that researchers continue to do most of their work on traditional software packages, namely SAS’s and MATLAB’s self-named package, as well as IBM’s SPSS.
SPSS led the pack with over 75,000 citations in scientific papers, which were culled through a search on Google Scholar. SAS follows in second place with almost 40,000 citations. R was used in well over 20,000 research projects.
Moreover, when Muenchen examined the number of citations since 1995, he found that SPSS citations have declined since 2007. SAS trailed SPSS in usage, peaking in 2008. The use of R, in contrast, has been growing dramatically, faster than other packages such as Statistica and Stata.
“Extending the downward trend of SPSS and the upward trend of R make it likely that sometime during the summer of 2014 R became the most dominant package for analytics used in scholarly publications,” Muenchen wrote. “Due to the lag caused by the publication process, getting articles online, indexing them, etc. we won’t be able to verify that this has happened until well into 2015.”
R is an open-source functional programming language designed for statistical computing and graphics .
Muenchen, a certified statistician who manages the research computing support at the University of Tennessee, may not be the most impartial person to declare a victory for R—he also works as an R instructor on behalf of Revolution Analytics. But he has also been long recognized as an expert in computer analytics, contributing code to SAS, SPSS and various R packages. He has also served on the advisory boards of SAS and SPSS before it was acquired by IBM in 2009.
Muenchen did not speculate in the blog post summarizing his findings about why R is gaining popularity.
That implementations of R are available as open source—and can be downloaded by researchers starting on a project at no cost—may be a factor in its popularity, said Al Hilwa, who covers enterprise software development for IT analyst firm IDC.
“Like many open source projects with active communities, R has gotten better with time,” Hilwa wrote in an email. “I think what we are seeing are trends that are long in motion. Acquiring of developer skills around programming languages takes time and so what we are seeing is a delayed effect reflected in actual use data.”
In his study, Muenchen did not distinguish between which R distributions users cited, which could be Revolution Analytics’ open-source or enterprise editions, or the open-source volunteer-led R Project.
Other indicators also seem to point to the growing popularity of R, Muenchen noted. The number of job postings on Indeed.com requiring R skills has surpassed those asking for SPSS experience, though they are still fewer than the number of ads calling for SAS expertise. The number of books and discussion forums devoted to R exceeds those for either SAS or SPSS.
Hilwa also noted that there is increasing demand for workers with statistical and data analysis skills in general, which can be seen as the “tide that lifts all boats in this ecosystem of languages,” he wrote.