Sunday, March 6, 2011 at 08:05PM
The annual Computer-Assisted Reporting Conference that concluded in Raleigh, N.C., on Sunday was extraordinarily rich in useful free tool for all sorts of data analysis and visualization, thanks to invitations accepted by computer scientists from Google, MIT, Stanford and the like.

Here are links to 13 of these free tools that I found to be particularly useful for data analysis in journalism:

  1. Exhibit: This link goes to a page with a collection of links from MIT’s Simile project. Exhibit is a JavaScript program that will let you take data that is in tabular form (rows and columns) and see it in a variety of formats: sortable and filterable table, map, time line, etc. Look at the Examples to get an idea of what can be done. Note also the DataPress link, which goes to a WordPress add-in that will let you put Exhibit visualizations on your blog. Using Exhibit requires some knowledge of html, though it’s not hard to simply copy its code and tweak it. Dido is a prototype for a user-interface that will let you specify your Exhibit without having to get into the code.
  2. TimeFlow: This JavaScript program will take tables that have a time element and create interactive time lines that can be filtered and colored by multiple variables. It’s purely an analysis tool, not designed for Web presentation. I can see it as being useful for investigative projects.
  3. ManyEyes:  This site lets you upload data and visualize it using a wide variety of interesting displays: maps, word trees, tag clouds, tree maps, bubble charts, matrix charts, network diagrams, etc. Check out the more than 84,000 such visualizations that people have created for ideas of what to do with your own data.
  4. Gapminder: This site was created by Swedish scientist Hans Rosling, who wanted to make it easy for people to explore the effects of literally hundreds of variables on the world’s nations. Basically, these are multivariate x-y scatter plots that are animated across time. What’s cool, wholly aside from playing with the variables already attached to the site, is that Google recently bought it and added a way for users to upload their own time-stamped data.
    Wall Street Journal Tableau Public growth rate of entrepreneurial companies

    The Wall Street Journal used Tableau Public to create this visualization that compares the performance of 100 of the largest publicly traded software companies across more than three decades.

  5. Tableau Public (Windows only): The full version of Tableau is commercial, but the company has made this public version available to anyone. It does many of the same things that Exhibit does; click on the Gallery link to see examples. The free version is hardly lame — it’ll hold up to 100,000 records. Tableau Public is offering a free Webinar on March 3 on Finding, Scraping and Using Online Data.
  6. Protovis : This downloadable program from Stanford’s visualization lab is like ManyEyes but with an even wider variety of charts and graphs that can be made; click on the Examples link to see. But using Protovis does require some heavy-lifting code-writing.
  7. Google Refine: Google calls this “a power tool for working with messy data.” Data journalists are using it to clean up inconsistencies in campaign-contributions databases, for example. It will quickly standardize names such as “Steve Doig” and “Stephen Doig” and “Steven Doig” into one of those. It also will help you with some basic analysis, like the equivalent of Excel’s pivot tables.
  8. Google Fusion Tables:  This is another Exhibit-like application that will show and publish your data as maps, time lines and/or charts. Unlike Exhibit, you don’t have to get your hands dirty working under the hood with the html and JavaScript.
  9. DataWrangler: Another goody from Stanford, it is similar to Google Refine. It is good for transforming data that’s in an awkward format into something that can be analyzed more readily.
  10. TwitInfo : Another MIT tool, this one is designed to take a large volume of Twitter tweets in real time and show them as a graphical timeline with peaks of activity. It includes a sentiment algorithm that classifies tweets as negative or positive. The research paper describes how it could be used for news gathering, such as pinpointing earthquakes ahead of the geologists.
  11. NodeXL: This is an Excel add-in that does graphical displays of social networks.  These displays are designed to show how people are interconnected. An example would be connections among members of various corporate boards. Another example is one my son Matt created for the Sarasota Herald-Tribune last year involving groups of home buyers and sellers who were involved in massive mortgage-fraud schemes. These displays can get ungainly quickly; one name used by journalists for the complicated ones is “the hairball.”
  12. Jigsaw: I haven’t played with this offering from Georgia Tech, but it is supposed to be useful for making sense of large collections of documents. It’s not a downloadable program or uploadable app, but the folks at Tech invite people with data to get in touch with them about using it.
  13. DocumentCloud: This is a service created by news-app developers from The New York Times and ProPublica.org. It’s designed for putting original documents (even handwritten ones) up on the Web and annotating them. The idea is to let news consumers see those documents — court filings, legislation, transcripts, arrest records, audits, e-mails, etc. – with detailed annotation and explanation added by the reporters covering the story or other experts. Here’s an example from The Arizona Republic on Arizona’s immigration-enforcement law, SB 1070.

Not enough to keep you busy? Conference attendee Chrys Wu put together links to most of the presentations, slides, websites and reference books that were demonstrated or discussed at the conference, put on by Investigative Reporters and Editors.

