When the Visualization Eclipses The Data...

In his 2003 novel Pattern Recognition, William Gibson created a character named Cayce Pollard with an unusual psychosomatic affliction: She was allergic to brands. Even the logos on clothing were enough to make her skin crawl, but her worst reactions were triggered by the Michelin Tire mascot, Bibendum.

Although it’s mildly satirical, I can relate to this condition, since I have a similar visceral reaction to word clouds, especially those produced as data visualization for stories.

If you are fortunate enough to have no idea what a word cloud is, here is some background. A word cloud represents word usage in a document by resizing individual words in said document proportionally to how frequently they are used, and then jumbling them into some vaguely artistic arrangement. This technique first originated online in the 1990s as tag clouds (famously described as “the mullets of the Internet“), which were used to display the popularity of keywords in bookmarks.

More recently, a site named Wordle has made it radically simpler to generate such word clouds, ensuring their accelerated use as filler visualization, much to my personal pain.

So what’s so wrong with word clouds, anyway? To understand that, it helps to understand the principles we strive for in data journalism. At The New York Times, we strongly believe that visualization is reporting, with many of the same elements that would make a traditional story effective: a narrative that pares away extraneous information to find a story in the data; context to help the reader understand the basics of the subject; interviewing the data to find its flaws and be sure of our conclusions. Prettiness is a bonus; if it obliterates the ability to read the story of the visualization, it’s not worth adding some wild new visualization style or strange interface.

Of course, word clouds throw all these principles out the window. Here’s an example to illustrate. About six months ago, I had the privilege of giving a talk about how we visualized civilian deaths in the WikiLeaks War Logs at a meeting of the New York City Hacks/Hackers. I wanted my talk to be more than “look what I did!” but also to touch on some key principles of good data journalism. What better way to illustrate these principles than with a foil, a Goofus to my Gallant?

And I found one: the word cloud. Please compare these two visualizations — derived from the same data set — and the differences should be apparent:

I’m sorry to harp on Fast Company in particular here, since I’ve seen this pattern across many news organizations: reporters sidestepping their limited knowledge of the subject material by peering for patterns in a word cloud — like reading tea leaves at the bottom of a cup. What you’re left with is a shoddy visualization that fails all the principles I hold dear.

Every time I see a word cloud presented as insight, I die a little inside.

For starters, word clouds support only the crudest sorts of textual analysis, much like figuring out a protein by getting a count only of its amino acids. This can be wildly misleading; I created a word cloud of Tea Party feelings about Obama, and the two largest words were implausibly “like” and “policy,” mainly because the importuned word “don’t” was automatically excluded. (Fair enough: Such stopwords would otherwise dominate the word clouds.) A phrase or thematic analysis would reach more accurate conclusions. When looking at the word cloud of the War Logs, does the equal sizing of the words “car” and “blast” indicate a large number of reports about car bombs or just many reports about cars or explosions? How do I compare the relative frequency of lesser-used words? Also, doesn’t focusing on the occurrence of specific words instead of concepts or themes miss the fact that different reports about truck bombs might be use the words “truck,” “vehicle,” or even “bongo” (since the Kia Bongo is very popular in Iraq)?

Of course, the biggest problem with word clouds is that they are often applied to situations where textual analysis is not appropriate. One could argue that word clouds make sense when the point is to specifically analyze word usage (though I’d still suggest alternatives), but it’s ludicrous to make sense of a complex topic like the Iraq War by looking only at the words used to describe the events. Don’t confuse signifiers with what they signify.

And what about the readers? Word clouds leave them to figure out the context of the data by themselves. How is the reader to know from this word cloud that LN is a “Local National” or COP is “Combat Outpost” (and not a police officer)? Most interesting data requires some form of translation or explanation to bring the reader quickly up to speed, word clouds provide nothing in that regard.

Visualization is reporting, with many of the same elements that would make a traditional story effective.

Furthermore, where is the narrative? For our visualization, we chose to focus on one narrative out of the many within the Iraq War Logs, and we displayed the data to make that clear. Word clouds, on the other hand, require the reader to squint at them like stereograms until a narrative pops into place. In this case, you can figure out that the Iraq occupation involved a lot of IEDs and explosions. Which is likely news to nobody.

As an example of how this might lead the reader astray, we initially thought we saw surprising and dramatic rise in sectarian violence after the Surge, because of the word “sect” was appearing in many more reports. We soon figured out that what we were seeing had less to do with violence levels and more to do with bureaucracy: the adoption of new Army requirements requiring the reporting of the sect of detainees. Of course, the horrific violence we visualized in Baghdad was sectarian, but this was not something indicated in the text of the reports at the time. If we had visualized the violence in Baghdad as a series of word clouds for each year, we might have thought that the violence was not sectarian at all.

In conclusion: Every time I see a word cloud presented as insight, I die a little inside. Hopefully, by now, you can understand why. But if you are still sadistically inclined enough to make a word cloud of this piece, don’t worry. I’ve got you covered.

This is an insightful and rather shrewd criticism of word clouds, and I think it applies to much of the infographic, data-visualizaion obsessed tech culture we live in.

I find myself fascinated by many of the new and innovative ways to graphically represent data. Yet, as Jacob Harris points out, many of these sleek new techniques (if they don't miss the point entirely) strip supposedly core ideas from the very context that lend them meaning... and we are left with a aesthetically pleasing series of pretty graphs and pie charts that convey very little actual information (see my post on the Infographic Idiom).

And even though CNN, Fox and other news networks are now embracing new visualization tools, tag clouds are ultimately useless measures of political sentiment, because concepts themselves really cannot be reduced to their most elemental articulation; in a word.

Joining the Real Estate Search Party Online

UNTIL recently, real estate brokers in New York City rarely shared information about one another’s listings. As a result, buyers had no way of knowing whether their agent was showing them every property available, and sellers wondered whether their homes were getting the exposure necessary to secure the best deal.

 

Neil Binder, the president of Bellmarc Realty, says its VOW will allow property comparisons.

Companies like StreetEasy, Zillow and The New York Times have helped open up the market by gathering listing information from various real estate databases and making it easy for consumers to search for homes online. But many brokerages still display only the firm’s exclusive listings on their Web sites — either because they are focusing on selling their own properties or resigned to the fact that customers have migrated elsewhere to research what is on the market.

Other brokerage firms are getting into the digital game themselves, creating a “virtual office Web site” or VOW. These are sites operated by brokers that enable clients to search for most of the available properties in a particular market, not just the firm’s exclusive listings.

While brokers have mixed feelings about whether these sites are worth the investment, the emergence of the VOW is yet another sign that once tightly guarded listing information has finally been set free in New York.

“Five years ago, protecting listings was the single most important thing, and people were very selective about where their listings ended up,” said Eric Gordon, the managing director of RealPlus, which develops VOWs for clients as well as operating the listings database used by members of the Real Estate Board of New York. “Now they want us to send their listings to every site we could possibly send them to. There are exceptions, but in general, the feeling is, ‘just get our listings out there as quickly and efficiently as possible.’ ”

The virtual office Web site concept was spurred by a 2008 settlement between the Justice Department and the National Association of Realtors, which forced brokerages to share listing data with their rivals, including Internet-based firms that offer rebates or other discounts to buyers willing to do most of the legwork to find a home.

In most parts of the country, brokers share information about properties through a multiple listing service, or M.L.S., a database operated by a real estate association on behalf of its members. Although Manhattan, Queens, Brooklyn and the Bronx each have a multiple listing service, many agents in New York City are not members and instead participate in a similar service managed by the Real Estate Board of New York, called R.L.S.

Agents who belong to these services are typically required to share property information with other brokers within a day or two of signing an exclusive listing, and these databases now share listings with each other as well as sites like StreetEasy, The New York Times, and hundreds of national and international portals like Yahoo and Google.

Sites like StreetEasy have free rein to publish listing information online for customers to search, aggregating data from various sources to create fairly comprehensive databases of properties available in New York City, including homes for sale by owner. But if brokerages want to post other firms’ listings on their Web sites, they must go through the process of becoming a virtual office Web site.

For prospective buyers the main difference between a VOW and other real estate search sites is that a VOW has to adhere to rules dictated by the Justice Department settlement, including a requirement that customers register with a name, an e-mail address and a password before they can search for listings.

Required registration can be a turn-off, some agents say, especially for casual shoppers.

“The problem I have with VOWs is that they force you to register before you can get information about properties,” said Douglas Heddings, the president of the Heddings Property Group. “To me it seems like a step backward, in that it’s holding the information hostage.”

Although the Heddings Property Group was one of the first firms in New York City to create a virtual office Web site just last year, Mr. Heddings said he was planning to abandon it in favor of a partnership with Buyfolio, a company that allows agents and their customers to search for listings as well as share feedback about properties.

For Buyfolio, a relatively new company focusing on the New York market, these collaboration tools are a key selling point. But now that real estate brokerages, technology start-ups like StreetEasy and media companies like The New York Times all have access to the same basic data about listings, the competition to attract buyers searching for homes online is heating up.

“You’ve still got to bring people to your Web site,” said Steven Spinola, the president of the Real Estate Board of New York, or Rebny. “Just creating a VOW doesn’t mean people are going to come and use it.”

So far, 98 of the 484 residential brokerage firms that are members of Rebny have created a virtual office Web site, Mr. Spinola said. This involves paying a fee to have the board audit the site to ensure it complies with the standards, like how client registration is handled and how listings data is managed.

But when the board recently overhauled its own Web site, now called NY1Residential.com, it partnered with the local news channel NY1 and decided not to create a VOW, partly to avoid the registration requirement.

“We made a decision that we weren’t going to ask people to sign in,” Mr. Spinola said. “It was just the sense of the members that they wanted to keep it an open Web site that anyone could search.”

Among New York City real estate firms, there are mixed feelings about whether a VOW delivers enough benefits to justify either the cost of creating one or the trade-offs involved in complying with rules about how these sites interact with clients.

After signing up for a VOW, customers have to wait for an e-mail to confirm that they have registered, and must also agree to terms and conditions that can run as long as a dozen pages. Those terms typically include an acknowledgment that the customer is entering into a lawful consumer-broker relationship with the agency, legally required language that does not obligate the buyer to work with the agency. This can seem like overkill just to search for, say, two-bedroom apartments in Chelsea.

But some brokerages are wagering that the hurdles are worth jumping, that there is money to be made from providing clients with a comprehensive set of properties rather than just the firm’s own listings, the traditional practice. Most VOWs also include tracking features that allow the agency to monitor customers’ searches, potentially producing useful data about what clients are looking for online.

“It was complicated to become a VOW, and it was costly,” said Dottie Herman, the president of Prudential Douglas Elliman. But, she said, the company’s Web site is more client-friendly now, allowing searches for properties in Manhattan, Brooklyn, Queens, Long Island, the Hamptons and Westchester County, including listings from other firms.

As for the registration requirement, Ms. Herman said she would have preferred that it be optional, but she doesn’t view it as a major deterrent. “I think most people don’t have a problem with it, because everybody asks for your e-mail address today,” she said.

Visitors to the Prudential Douglas Elliman site, once they have signed up, can search, sort and save listings. Results are displayed with the firm’s exclusive listings first, then those of other firms. But unlike, say StreetEasy, there is no direct link to the other firm’s site.

Bellmarc Realty is another big firm that is embracing the virtual office approach. Neil Binder, the president of Bellmarc, said that after experimenting with allowing individual agents to offer a VOW, generally using third-party software, the firm decided to develop a company-wide site instead, which will debut once it gets Rebny’s approval.

Mr. Binder said that the Bellmarc agents who tried VOWs created by third-party vendors did not find they generated much business, but he believes the new VOW will be more effective.

“It’s going to be more of an evaluation tool than an information tool,” Mr. Binder said. “I’m trying to create a process of comparison to show how properties stand up next to each other.”

Other large firms in the city are taking a wait-and-see approach. Diane M. Ramirez, the president of Halstead Property, said that about a quarter of the company’s agents had incorporated a VOW into their individual pages, but that Halstead had not developed one for its corporate site.

Corcoran has not jumped on the VOW bandwagon at all, said Pamela Liebman, the company’s president, partly because listing information is already widely available and partly because of doubts about VOWs.

“We’ve watched the traffic of some of the firms that have put VOWs on their site, and from what we can see it hasn’t increased,” Ms. Liebman said, adding that Corcoran also had not experienced an uptick in the number of deals it is doing with buyers’ brokers who have virtual office Web sites.

Beyond basic listing data, real estate Web sites compete for buyers, and page views, by offering additional information: price histories, recorded sales, building details and school district data — as well as discussion forums, mapping tools and features that make it easier to search for homes and then sort the results.

Zillow and The New York Times offer real estate apps for mobile devices, and these mobile users now account for a third of Zillow’s traffic on weekends, said Amy Bohutinsky, the company’s chief marketing officer, a trend that could put VOWs at a disadvantage as more people embrace smartphones.

However, brokers say they are not trying to compete with these sites, which are viewed more as information distributors than rivals, especially at a time when so much data has been digitally set free.

“Now listings are all over the place — all that information is published by a million different sites,” Ms. Herman said. “This is the world we’re in today, and if you don’t embrace change I don’t think you can be in business.”

     -via NYTimes

Project Cascade: Tracking Content From Inception Thru Dissemination

Cascade allows for precise analysis of the structures which underly sharing activity on the web.

This first-of-its-kind tool links browsing behavior on a site to sharing activity to construct a detailed picture of how information propagates through the social media space. While initially applied to New York Times stories and information, the tool and its underlying logic may be applied to any publisher or brand interested in understanding how its messages are shared.

via nytlabs.com

This is absolutely fascinating, and shines a little light on the social loopholes in the NYTimes paywall.

When the Data Struts Its Stuff

 

IN an uncharted world of boundless data, information designers are our new navigators.

Mondowindow.com, by Stamen Design, shows airline passengers detailed visualizations of the ground below them.

In a Stamen graphic of Twitter traffic during an MTV awards show, the number of tweets about celebrities was reflected in the size of their photos.

They are computer scientists, statisticians, graphic designers, producers and cartographers who map entire oceans of data and turn them into innovative visual displays, like rich graphs and charts, that help both companies and consumers cut through the clutter. These gurus of visual analytics are making interactive data synonymous with attractive data.

“Statistics,” says Dr. Hans Rosling, a professor of international health at the Karolinska Institute in Sweden, “is now the sexiest subject around.”

Dr. Rosling is a founder of Gapminder, a nonprofit group based in Stockholm that works to educate the public about disparities in health and wealth around the world — by offering animated interactive statistics online that help visitors spot trends on their own.

Hit the play button and an animated graphic, called Gapminder World, shows a constellation of brightly colored bubbles, each representing a different country, bouncing along over two centuries. Without ever having to view yawn-inducing numbers on gross domestic product per capita, you can watch some countries, like the United States, rapidly growing healthier and wealthier before your eyes while smaller bubbles, for countries like Congo, rise on the life expectancy axis even as they dip on the income line.

The advanced animation has let Dr. Rosling make wonky statistics about poverty as intuitive and potentially fascinating for viewers as a nature program about the Serengeti on TV. “If we show a herd of zebras, and one zebra has a bad leg and lags behind, you can see that immediately,” says Dr. Rosling, whose video clip from the BBC on health and wealth statistics has been viewed more than four million times on YouTube. “If one country gets left behind, you can see that, too.”

Visual analytics play off the idea that the brain is more attracted to and able to process dynamic images than long lists of numbers. But the goal of information visualization is not simply to represent millions of bits of data as illustrations. It is to prompt visceral comprehension, moments of insight that make viewers want to learn more.

“The purpose of visualization,” says Ben Shneiderman, founding director of the Human-Computer Interaction Laboratory at the University of Maryland, “is insight, not pictures.”

The growing field has implications for companies, governments, academic institutions, nonprofit groups, news organizations and marketers — just about anybody who tries to convey huge amounts of information in visual, interactive forms. But advances, he says, come with both benefits and risks.

On the benefit side, people become more engaged when they can filter information that is presented visually and make discoveries on their own.

On the risk side, Professor Shneiderman says, tools as powerful as visualizations have the potential to mislead or confuse consumers. And privacy implications arise, he says, as increasing amounts of personal, housing, medical and financial data become widely accessible, searchable and viewable.

“The visual analytics research community works on these issues,” he says, “but more needs to be done.”

In the 1990s, Professor Shneiderman developed tree mapping, which uses interlocking rectangles to represent complicated data sets. The rectangles are sized and colored to convey different kinds of information, like revenue or geographic region, says Jim Bartoo, the chief executive of the Hive Group, a software company that uses tree mapping to help companies and government agencies monitor operational data. When executives or plant managers see the nested rectangles grouped together, he adds, they should be able to immediately spot anomalies or trends.

In one tree-map visualization of a sales department on the Hive Group site, red tiles represent underperforming sales representatives while green tiles represent people who exceeded their sales quotas. So it’s easy to identify the best sales rep in the company: the biggest green tile. But viewers can also reorganize the display — by region, say, or by sales manager — to see whether patterns exist that explain why some employees are falling behind.

“It’s the ability of the human brain to pick out size and color” that makes tree mapping so intuitive, Mr. Bartoo says. Information visualization, he adds, “suddenly starts answering questions that you didn’t know you had.”

For entertainment value, the Hive Group has also posted a tree map of the 100 most popular songs on iTunes, updated every 24 hours.

The fact that serious software companies are now tree mapping the pop charts is a sign that data visualization is no longer just a useful tool for researchers and corporations. It’s also an entertainment and marketing vehicle.

In 2009, for example, Stamen Design, a technology and design studio in San Francisco, created a live visualization of Twitter traffic during the MTV Video Music awards. In the animated graphic, floating bubbles, each displaying a photograph of a celebrity, expanded or contracted depending on the volume of Twitter activity about each star. The project provided a visceral way for viewers to understand which celebrities dominated Twitter talk in real time, says Eric Rodenbeck, the founder and creative director of Stamen Design.

Information visualization has changed substantially in the 10 years since the studio has been in business, Mr. Rodenbeck says. Designers once created visual representations of data that would steer viewers to information that seemed the most important or newsworthy, he says; now they create visualizations that contain attractive overview images and then let users direct their own interactive experience — wherever it may take them.

“It’s not about leading with a certain view anymore,” he says. “It’s about delivering the view that gets the most participation and engagement.”

 

TO that end, the company has just designed a site for a client,  mondowindow.com, that shows airline passengers a detailed satellite map of the landscape they are flying over — and lets them direct the view.

For passengers with Wi-Fi access who enter their airline and flight number on the site, mondowindow.com displays more than just the terrain below. It also offers information bubbles highlighting different place names, local landmarks and tourist attractions like schools and botanical gardens, and photos of native fauna, like a blue jay.

On the ground, we may live in a world of T.M.I. — too much information. But Stamen Design is betting that we will relish rich images of ground data when we are flying several miles high.  

New York Times Paywall Launches Today In Canada; Globally March 28

The Canadians always complain they don’t get things first. Well, this time they’re at the head of the line. The New York Times (NYSE: NYT) digital subscriptions kick in there today; March 28 for the rest of us, allowing the NYT to avoid April Fools’ jokes. (We reported this morning that the long-awaited announcement could come as early as today.)

See more of our latest Apps coverage
or add an alert for future coverage of Apps.

Canada is the testing ground, giving the Times roughly 10 days to fine-tune the complicated product—and the message—before it goes global. Access is cut off after 20 article views in a month; the three plans for non-print subscribers start at $15 a month. Home delivery subscribers will get full access to the site and the full content of certain apps; the Times has been a long-time believer in the concept of adding value to its expensive print subscriptions through bundles. Access to TimesSelect was included, for instance, as were earlier efforts to create additional revenue digitally. But the all-access pass doesn’t include everything; the e-editon and premium crosswords are excluded. (Drat.)

As promised, the Times has tried to carve out as much space as possible for links to its stories. In a letter to NYTimes.com registered users that just went out, Publisher Arthur Sulzberger, Jr. explains:

Readers who come to Times articles through links from search, blogs and social media like Facebook and Twitter will be able to read those articles, even if they have reached their monthly reading limit. For some search engines, users will have a daily limit of free links to Times articles.

That should make access to far more than 20 articles a month possible—and should quiet some of the people who claim the Times is walling itself off or becoming irrelevant.

Access to some news stays open: The Times also has carved out spaces that will remain free. At NYTimes.com that includes the home page and all section fronts; for mobile apps, the “Top News” sections will be accessible.

Complying with Apple: The Times promises to add one-click access in iOS apps by June 30 , which gives it a head start to get as many users as possible converted to subscriptions before the 30 percent share with Apple (NSDQ: AAPL) kicks in.

Full press release below; more to come.

From the release:

The New York Times announced today that it is launching digital subscriptions, which will affect some users of its award-winning Web site, NYTimes.com, and its applications for smartphone and tablet. The subscription plan allows for free access to a set amount of content across digital platforms. When the monthly reading limit is reached, users who are not already home delivery subscribers will be asked to become digital subscribers.

Digital subscriptions will be available in the United States and globally on March 28, 2011. The Times is launching digital subscriptions in the Canadian market beginning today in order to fine-tune the customer experience prior to the global launch.

For non-home delivery subscribers, the basic package - NYTimes.com plus Smartphone App - will start at $15 every four weeks. The NYTimes.com plus Smartphone App package is currently available for purchase by users in Canada. On March 28, the global launch, The Times will offer three digital subscription packages, all of which include access to the Web site. Details are outlined below.

In making today’s announcement, Arthur Sulzberger, Jr., chairman of The New York Times Company and publisher of The New York Times, said, “Today marks a significant transition for The Times, an important day in our 159-year history of evolution and reinvention. Our decision to begin charging for digital access will result in another source of revenue, strengthening our ability to continue to invest in the journalism and digital innovation on which our readers have come to depend. This move will enhance The Times’s position as a source of trustworthy news, information and high-quality opinion for many years to come.”

Janet L. Robinson, president and chief executive officer of The New York Times Company, added, “As the market for and delivery of digital content evolves, we believe that supplementing advertising revenue with digital subscription revenue makes tremendous sense. The step we are taking today will further improve our ability to provide high-quality journalism to readers across the world on any platform, while maintaining the large and growing audience that supports our robust advertising business.”

Details about the digital subscription:

  * All users of NYTimes.com are able to enjoy 20 articles at no charge each month (including slideshows, videos and other forms of content). Beyond 20 articles and for open access to the site, users will be asked to become digital subscribers.

  * On The Times’s smartphone and tablet applications, the Top News section will remain free. To delve deeper into the apps’ other sections, users will be asked to become digital subscribers.

  * The Times is offering three digital subscription packages that allow users to choose the devices on which they want to access Times content. NYTimes.com will be included as part of any subscription. Details and pricing for these plans is available at http://www.nytimes.com/access. Introductory offers will be available.

  All New York Times home delivery newspaper subscribers receive free, unlimited access to NYTimes.com and the full content on all of The Times’s applications. Home delivery subscribers can go to http://homedelivery.nytimes.com to sign up for free access.

  * Readers who come to Times articles through links from search, blogs and social media will be able to access those individual articles, even if they have reached their reading limit. For some search engines, users will have a daily limit of free links to Times articles.

  * The homepage at NYTimes.com and all section fronts will remain free to browse for all users at all times.

In keeping with Apple’s new subscription service terms, The Times will make 1-click purchase available in the App Store by June 30 to ensure that readers can continue to access Times apps on Apple devices.

Subscribers to the print edition of the International Herald Tribune, the global edition of The New York Times, will receive free, unlimited access* to NYTimes.com.

For more details about The Times’s digital subscriptions, go to http://www.nytimes.com/access, or see the FAQ’s: http://www.nytimes.com/digitalfaq.

*Mobile apps are not supported on all devices. Does not include e-reader editions, Premium Crosswords or The New York Times Crosswords apps. Other restrictions apply.