How much is better?

I got my copy of Dona Wong’s “The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures” two weeks ago and it is time to post a comment now …

The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures The book is a typical “How to”-book and like many other books (e.g., Graphing Statistics & Data: Creating Better Charts,  Creating More Effective Graphs or The Elements of Graph Design) it tells us what to do and what not to do. This imperative style is definitely well suited for the novice in visualization (it is always good to give those at the beginning of the learning curve clear guidance), but for everybody who has a certain experience in the display of quantitative information, there is this slight discomfort with this style. As Dona Wong is a student of Edward Tufte, the look and feel of her book has not only some remembrance of Tufte’s work, but also lets us compare the two styles. Tufte’s books take us on an adventurous journey and at the end we feel enlightened – “Guide to Information Graphics” is more like a boot camp which we leave like drilled elite soldiers.

In the end, successful visualization has much to do with creativity and Wong’s approach will more support in avoiding worst cases than inspire for genius. Don’t get me wrong; the book is probably the best dos and don’ts book on visualization around – it is just that you need to like that style. If you already own one of the above mentioned books, this one might just not be better enough to get a copy.

There is also a review on junk charts.

Paradigm Shift or just a big iPhone

When Steve Jobs unveiled the Apple iPad, many of us expected some disruptive move regarding the user interface. In fact, there was nothing revolutionary new in comparison to the iPhone – yes, the iPad is bigger.

There is one thing to keep in mind, though. The computer in its current form is a general purpose computing device, which tries to offer everything from programing to multi-media. Given this general approach, we must assume that more and more of the current applications will move away from the general computing device to, e.g., a handheld device to read and browse the internet – let’s call it iPad for now …

Here is an entry from the latest HCI International Newsletter, which asks whether or not we will see this paradigm shift with the iPad.

Will the Apple iPad be a paradigm shift that will change the way we use portable computers?

On January 27, 2010, Apple launched the iPad,  a device for browsing the web, reading and sending email, photos, watching videos, playing games, reading e-books, and even more. The device has a high-resolution Multi-Touch display that can be described as a bigger, more refined version of the iTouch. Although the tablet computer is not necessarily new, other computer companies have offered similar tablet products before, the success of the Apple iTablet, along with the Amazon Kindle for digital book and iPhone/iTouch that had haptic interfaces, indicates a paradigm shift in portable computer usage. This could also be another major change in the way that people apprehend portable computers, considering the following major changes: graphical user interfaces in making the computer software accessible to everybody, portable music players in the way that people listen to music, and portable computers for giving the opportunity to have a portable office, as well as the Internet’s refinements in breaking the wall and connecting and making everything available to everybody, and finally mobile phone’s in making it technologically possible to connect anybody to everybody almost anywhere. Even though we still need to physically experience the exact behavior of the iPad, we can anticipate the following major changes in user behavior.

  • Reading from a vertical screen to physically changeable position in the way user wants to view or read.
  • Cut and paste versus keyboard data entry.
  • Direct access to desired application versus free navigation.
  • Multi tasking all the time. We can already see that many people have their eyes usually focused on a small screen of a handheld device even when they are walking or talking to someone.

Thus what would be the impact of this device on user behavior?  And what are the other changes that we should expect to see soon? These are the questions that HCI experts will need to research.
A.M.

Homeland Security and Uniform Priors

Now that the German Constitutional Court has judged that the so called “Vorratsdatenspeicherung” – the data retention for six months of all phone calls and e-mails to prepare against potential terrorist threats – to be inconsistent with the German constitution, discussions will heat up again regarding the best or at least necessary means to protect against potential terrorist actions. The same court has already ruled out to use dragnet investigation except for the case of an “actual threat”.

xray

Things like dragnet investigations and passenger screening at airports are actually pretty much relevant for the application of statistical methods and data analysis tasks. Whereas for dragnet investigations we rely on the fact that “ordinary citizens” will somehow “live” in the central 99% percent of some multivariate distribution (and someone who prepares a plot will stick out as a multivariate outlier), airport passenger screenings make the odd assumption that all passengers pose the same threat as they are all processed the same way.

The luckily unsuccessful “Christmas Bomber” raised the question again, of whether or not to adopt the Israeli airport security model. This model heavily relies on the personal judgement of the (psychologically) trained screening officer, who looks at the passenger and asks a set of well prepared (from a passenger’s view apparently unstructured) questions. Depending on your answers and reaction (and of course other hard facts), you may walk straight through or miss your flight due to heavy further questioning and examination. (I am very much in favor of this system as the few times I flew to Israel so far, I apparently looked that harmless that I always walked right through – which was actually the right decision of the officers …)

There are certainly many technical issues that need to be addressed if (bigger) airports around the world would try to use this model, but it seems to be odd to assume that every passenger has the same a-priori probability to be a security problem, i.e., the white haired granny from Iowa and the 30 year old arabic male who departed from Syria – they are certainly equally treated by our constitutions, but boarding a plane is still no constitutional right.

What is the name of the game?

This is from Daniel Keim’s “Call for Papers” for the EuroVAST 2010:

“Visual Analytics is the science of analytical reasoning
supported by
interactive visual interfaces,
which requires interdisciplinary science integrating techniques from

  • visualization and computer graphics,
  • statistics and mathematics,
  • data management and knowledge representation,
  • data analysis and machine learning,
  • cognitive and perceptual sciences,
  • and more.”

Hmm, these are a lot of disciplines and my already vague understanding of what visual analytics might be, did not get any clearer. Computer science has always been good at creating new buzz words, which may generate money to fund new projects.

But I think there is more to it than just the buzz. Above list of disciplines somehow shows that all these traditional disciplines may fall short when it comes to solve today’s data analytical problems. If we (as people who may influence the direction of the one or the other discipline) would have pursued the philosophy which John W. Tukey started behind Exploratory Data Analysis (EDA) in the late 60s more openly, we would probably be a bit further by now than “Visual Analytics”.

Here is a sketch from a talk I gave 10 years ago which struggled with the quite similar problem of defining what the discipline of “Data Analysis” might look like:

VA

Apart from all confusion around terms and disciplines, one thing is for sure: if we do not start to teach different things to our students (including the title of courses and subjects) things will hardly change.

Fundamentals: Learning to Cook

For those who are used to work with graphics on a regular basis, it is usually not a question, what plot (or combination of plots) to use when looking at particular data problems. Nevertheless, many statistically trained researcher and practitioners have a hard time to translate data problems into reasonable graphics (not to mention their correct interpretation). To help selecting the “right” graphics to start with, the following table might be of some help:

Cook

This is certainly only a default orientation – the “killer graphics” will usually take more effort to create. Maybe this “cookbook for graphics starters” should be read along the discussion of Andrew Gelman’s post.

Those who use graphics effectively often are inclined to think they are doing something “low-status”, and think that the guys who come along with the next nifty model for datasets which have never been observed so far are doing “serious work”. That thinking definitely is wrong, and good and useful graphics are not at all easy for most statisticians!

PS: Combinations of graphics are of course most useful if linked highlighting can be used!

The Power to … What did you say?

It is just about a year ago (exactly January 6th, 2009) that a New York Times article on R did fuel the dispute on what statistical analysis tool is “the best”. One of the highlight of the article was a quote from SAS’ Anne H. Milley:

“I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

I recently found a SAS press release (dating March 23, 2009) entitled: “SAS to offer R integration to support analytical innovation”, which reads:

“It is no secret that SAS has been working on interfacing with R,” said Anne Milley, SAS’ Senior Director of Technology Product Marketing. “SAS and R are here to stay, and as organizations work to harness the full potential of their data, an expanded set of analytics options can only help.”

First let’s be cheerful about this move (whatever the actual solution will look like anyway), but on the other side, if Anne Milley’s quotes stand for SAS’ reliability, I doubt they deserve their reputation.

Mondrian Version 1.1 released

Pretty hard to get any attention while Steve is presenting the iPad 😉 , but nonetheless I like to point to the new version 1.1 of Mondrian. Here are the most important new features:

  • Load data directly from R workspace files
  • New color schemes
  • Compatible with Java 6 on all platforms
  • Very many bug fixes and minor features added

MondrianSession

All about Mondrian can be found at the website and in the book. (Sorry, Steve,  for the Windows 7 screenshot … it looks much nicer under MacOS X 🙂 )

“the rabbits in front of the snake”

On Wednesday, January 27th, (not only) the IT-world will be looking westwards to what is coming from Cupertino. Apple will reveal their “latest creation”.

iSlate

Nothing new regarding the staging: rumors are piling in blogs and news lists for months, analysts predict the dark or golden future of Apple and their competitors (depending on who pays them).

And yet, there seems to be a different touch this time. The Apple iSlate has been rumored for more than a year now, and even Steve Ballmer talked about “… what we will call slate PCs …” at this year’s CES. He showed actual hardware by HP and others, but they were running “only” Windows 7, i.e., there were merely PCs reduced to a small(er) screen without a keyboard – nothing what we would call innovation.

Not that the R&D departments of Microsoft, HP, Sony … have been shut down. No, even worse, they seem to hold still until Apple has defined the new standard of interactions and services for a tablet PC. This somehow reminds me of what happened with the iPhone, but whereas with the iPhone it seemed to have happened by chance, this time it looks to be on purpose.

Let’s see what we’ll get on Wednesday … !

PS: We don’t really know how the “iThing” will look like in particular, but take a look at this nice video from Bonnier R&D, which gives an idea of what it takes to move forward to new standards.

Welcome to the Ivory Tower

IvoryTowerHere is the post-post scriptum of one of Andrew Gelman’s blog entries. The post was discussing how it could possibly be that such an influential statistician like Brian Ripley has such an outdated webpage:

P.P.S. Somebody pointed out that you can search for B D Ripley’s recent papers using Google. Here’s what’s been going on since 2002. Aside from the R stuff, he seems to have been focusing on applied work. … I find that working with applied collaborators gives me insights that I never would’ve had on my own, and I’d be interested in hearing Ripley’s thoughts on his own successes and struggles on applied problems.

I am a bit puzzled that influential statisticians like Andrew Gelman seem to be surprised that the very important inputs come from real life problems. But maybe this is mainly caused by the fact that in graphical data analysis there is not much like a theory. The next important development usually comes from the next dataset which we can’t analyze efficiently. Once one understood the generalization of the solution, a new piece can be put into the mosaic.

Anyway, life outside the ivory tower is different (but reality) and I think it is important to regularly move in and out the tower.

The Costs of Exaggeration …

On an Apple related list I found a pointer to this price comparison chart. Although the author already put a disclaimer in his post that this graph was not intended to be “mathematically correct”, it is amazing how badly the actually information is hidden behind the rainbow chart.

Spiral

Using a simple barchart just does not deliver any dramatic story at all, but hey, if prices are almost identical in Hong Kong and the US, please don’t show a difference. Here is the less appealing, but faithful chart:

Bars

Not to mention all the problems of adding the correct taxes, which are not really solved for these prices …

Thinking Statistics – Visually

I found this on the infoaesthetics blog. There is one slide in the presentation that made me think:

Wells Quote

I got the impression that this quote from Herbert George Wells – more known for his science fiction literature – suffers badly when modified this way.

Statistical thinking – from my point of view – means the ability to understand figures (as in numbers) in a way that utilizes meaningful summaries and graphs, and can somehow distinguish between a signal and noise. I doubt that there is something like “visual thinking”. Rather there is statistical thinking (or more generally speaking analytical thinking) which utilizes graphical representations of the data in order to more easily summarize the essential information.

What Alex Lundry probably wanted to exchange in this slide was the presentation of statistical information in tables with the presentation of this information in graphs.

Chicken and Egg Problem: Follow Up

After getting the data together which was used to generate the visualization criticized in this post, it is just fair to prepare a better version. Tom Carden already showed some quick graphs which improve the initial “pie chart“. Note that I only show the 7 most relevant diseases and grouped the rest into one group “Rest” for simplicity.

A typical problem of the initial chart is that it tries to put many views into one single graph – this usually makes interpretation very hard. Looking at the data we can identify four major questions:

  1. How do the absolute total costs develop over age?
  2. Which diseases are dominant at what ages, i.e., a relative view?
  3. What are the shares to pay by the patients, and how do they differ between diseases and change over ages?
  4. Are the costs per patient much different for the diseases and how do they develop for older patients?

The absolute numbers of patients are only a secondary question here. To answer the four above questions four relatively simple graphs can be used which are easy to create in a simple (?) tool like MS Excel.

Absolute Costs
Total Costs

Not much to learn here (which is not better shown in the next chart), except for the fact that whereas Hypertension and Diabetes are almost generating no costs up to age 25, they are the major cost driver around the age of 60.

Relative Costs
Relative Costs

The relative view tells most of the story of the data (of course in conjunction with the absolute plot above). There are basically three groups of diseases:

  • Decreasing relative costs:
    Chronic Sinusitis, Asthma and Depression
  • Almost constant relative costs:
    Acid Reflux
  • Increasing relative costs:
    Osteoporosis, Diabetes and Hypertension (and Rest)

Share of Personal Costs
Share of Personal Costs

The share of costs for a patient  declines from on average roughly above 20% for younger patients to 15% for patients above 70. The share is almost constant for Acid Reflux and Osteoporosis up to age 60, and shows the strongest decline for Chronic Sinusitis.

Average Costs per Patient
Average Costs per Patient

The per patient costs do not show much difference between diseases and increase almost linearly up to age 55. Diabetes is by far the most costly disease whereas Chronic Sinusitis is by far the cheapest. Per patient costs increase sharply in the age range of 70-80 years no matter what disease we look at.

All over all the four graphs are relatively simple and easy to read. They hopefully enable us to get more easily somewhat like a story out of the data.