Sometimes you need to combine graphical representation of your data with heftier numerical analysis.
What it does: R is a general statistical analysis platform (the authors call it an "environment") that runs on the command line. Need to find means, medians, standard deviations, correlations? R can handle that and much more, including "linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering and smoothing," according to the project website.
R also graphs, charts and plots results. There are numerous add-ons to this open-source project that significantly extend functionality. For users who prefer a GUI, Peter Aldhous, San Francisco bureau chief for New Scientist magazine, suggests RExcel, which offers access to the R engine through Excel.
What's cool: There is a great deal of functionality in R, including quite a number of visualization options as well as numerical and spatial analysis.
Drawbacks: The fact that R runs on the command line means that users will have to take the time to learn which commands do what, and not all users will be comfortable with a text-only interface. In addition, Aldhous says those dealing with large data sets may hit a memory barrier (if so, there's a commercial option from Revolution Analytics).
Skill level: Intermediate to advanced. Comfort with command-line prompts and a knowledge of statistics are a musts for the core application.
Runs on: Linux, Mac OS X, Unix, Windows XP or later.
Learn more: Try R for Statistics: First Steps (PDF) by Peter Aldhous, Hands-on R, a step-by-step tutorial (PDF) by Jacob Fenton, and the project's own An Introduction to R. The R Statistics blog has a number of visualization samples.
Visualization applications and services
These tools offer a number of different visualization options. While some stick to conventional charts and graphs, many offer a range of other choices such as treemaps and word clouds. A few offer geographical mapping as well, although if you're interested in maps, our sections on GIS/mapping focus specifically on that.
What it does: This is one of the simplest ways I've seen to turn data into a chart or map. You can upload a file in several different formats and then choose how to display it: table, map, heatmap, line chart, bar graph, pie chart, scatter plot, timeline, storyline or motion (animation over time). It's somewhat customizable, allowing you to change map icons and style info windows.
There are some data editing functions within Fusion Tables, although changing more than a few individual cell entries can quickly become tedious. You can also join tables (which is important when the data you want to map is in multiple tables), and filter, sort and add columns and so on. There are also options to allow others to make comments on the data itself.
Mapping goes beyond just placing points, as many of us are accustomed to with Google Maps. Fusion tables can also map multiple polygons with variations in color based on underlying data, such as this intensity map showing the percentage of households with Internet access by state from 2007 U.S. Census bureau data.
Unlike IBM's Many Eyes, Google lets you designate your data as private or unlisted as well as public, although your data still resides on Google's servers -- a benefit or drawback, depending on whether server bandwidth costs or data privacy is more important to you.
What's cool: Fusion Tables offers relatively quick charting and mapping, including geographic information system (GIS) functions to analyze data by geography. The service also automatically geocodes addresses, which is useful when trying to place numerous points on a map. This is an excellent tool for beginners and advanced beginners to use to get comfortable with analyzing data; it's also a good fit for people who don't program. For more advanced users, there's an API.
Drawbacks: Functionality, customization and data capacity are all limited compared with desktop applications or custom code, and interacting with large data sets on the site can be sluggish. And it has its limitations -- the site choked on March 11, the day of the devastating earthquake and tsunami in Japan. (It is still a Google Labs beta project.)
Skill level: Beginner.
Runs on: Any Web browser.
Learn more: A Google Fusion Tables tour and several tutorials are available. We've also got some examples of what it can do in our story "H-1B Visa Data: Visual and Interactive Tools." Also see the Fusion Tables Example Gallery.
What it does: Impure is sort of a Yahoo Pipes for data visualization, designed for creating numerous types of highly polished graphical representations of data using a drag-and-drop workspace. The service includes a library of objects and various methods, and -- as with Yahoo Pipes -- it allows you to click and drag to connect modules so that the output of one becomes the input of another. It was developed by Spanish analytics firm Bestiario.
What's cool: Impure offers a highly visual interface for the task of creating visualizations -- which is not as common as you might expect. It has a sleek user interface and numerous modules, including quite a few APIs that are designed to pull data from the Web. It features numerous visualization types that are searchable by keywords like numeric, tables, nodes, geometry and map. And although it saves your workspaces to the Web, you can copy and save the code behind your workspaces locally, so you can back up your work or maintain your own libraries of code snippets.
Drawbacks: Users of Impure face a surprisingly steep learning curve despite its drag-and-drop functionality. The documentation is detailed in some areas, but lacking in others. For instance, while it was easy to find a list of APIs, it was more difficult to find basic instructions on how to use the workspace -- or even figure out that there was a workspace, let alone how to use the various objects and methods.
Once you save your workspace, it's on the public Web, although it's unlikely that anyone else will be able to find it unless you share the URL. And I found some of the samples not all that helpful in understanding the underlying data, even if they were visually striking.
Skill level: Intermediate.
Runs on: Any Web browser.
Learn more: To get started, I'd suggest the videos "Interface Basics" (7 minutes) and "Workspaces and Code." You can find a sample called The Pay Gap Between Men and Women Mapped at the website of British newspaper The Guardian.
What it does: This tool can turn data into any number of visualizations, from simple to complex. You can drag and drop fields onto the work area and ask the software to suggest a visualization type, then customize everything from labels and tool tips to size, interactive filters and legend display.
Drawbacks: In the free version of Tableau's business intelligence software, your visualization and data must reside on Tableau's site. Whenever you save your work, it gets sent up to the public website -- which means you can't save work in progress without running the risk that it will be seen before it's ready (while Tableau's site won't deliberately expose your work, it relies on security by obscurity -- so someone could see your work if they guess your URL). And once it's saved, viewers are invited to download your entire workbook with data. Upgrading to a single-user desktop edition costs $999.
Not surprisingly, all that functionality comes at a cost: Tableau's learning curve is fairly steep compared to that of, say, Fusion Tables. Even with the drag-and-drop interface, it'll take more than an hour or two to learn how to use the software's true capabilities, although you can get up and running doing simple charts and maps before too long.
Skill level: Advanced beginner to intermediate.
Runs on: Windows 7, Vista, XP, 2003, Server 2008, 2003.
Learn more: There are seven short training videos on the Tableau site, where you can also find downloadable data files that you can use to follow along.
You can see a sample in our article "Tech Unemployment Climbs; Self-employment Steady."