Tableau contest, additional views

For the tableau contest, I had prepared many other views which I didn’t include in the final dashboard. Here’s a couple:

This here is the number of points in the adult obesity rates which are not explained by the median income of the county. In other words, marks that appear very green are counties where people are less obese than counties with similar incomes, and conversely, red marks show counties where obesity is more widespread than expected by income alone.
This suggests that there could be some cultural/regional explanations to the health situation. People in the mountains area, especially Colorado, show as very green, while people in the old South are very red. Mid West and North West are average, New England and Florida tend to be better than average. Yesterday, there was a show on French TV calling Houston the fat capital of the USA, explaining that by cultural reasons. But the fact is, obesity rates in Harris county are lower than average, and on this map the corresponding mark is green, showing that factors other than income play a positive, not negative role. I love it when facts and numbers get in the way of a nice story.

and this is put together, say, just for aesthetic purposes. it categorizes counties by their median income (in increments of 1000$, X-axis) and their obesity rates (by chunks of 0.5%, Y-axis) and plots the total population of the counties that fill both criteria. Then again, it doesn’t show the actual number of people who are in a specific income and obesity bracket, it just adds the population of whole counties.


My tableau contest entry

so here it is. I chose to compete on the Activity Rates and Healthy Living data set, because after downloading it I really enjoyed exploring it.

If the viz doesn’t show well in the blog, here’s a link to its page

My main reason for entering the contest is to be able to see what others have done. There are obviously many, many ways to tackle this and I am very much looking forward to see everyone’s work! my interactions with the Tableau community, especially through the forum, have always been very rewarding and what better way to learn than from example!

So for the fellow contestants that will see my work, here is my train of thoughts for the dashboard.

The dataset

I’m aware of USDA’s food environment atlas. It’s an application where people can see various food-related indicators on a map. The dataset we were handled is actually the background data of this. So, there is already a place where people can consult food indicators.

Now this beeing Tableau and all, I wanted to create an analytical dashboard where people could understand if and how the input variables affected the output variables.

The dataset consists mostly of input variables: various indicators that influence how healthy a local population is. That status (output) is expressed through a few variables, such as adult and child obesity rates and adult diabetes rates. Those variables are highly correlated with each other, so in my work I chose to focus on adult obesity rates which is the simplest one.

Now, inputs. The rest of the variables fall in several categories:

  • income (median household income, poverty rates);
  • diet (consumption of various food items per capita);
  • shopping habits (for various types of stores or restaurants, the dataset would give their number and the money spent in each county, both in absolute numbers and per capita);
  • lifestyle information (data on households without cars and far from stores, on the physical activity level of the population, and the facilities offered by the state);
  • pricing variables (price ratios between some “healthy” food items and some less healthy, equivalent food items, for instance fruits vs. snacks; tax information on unhealthy food);
  • policy variables (measuring participation to various programmes such as SNAP or WIC);
  • socio-demographic variables (ethnic groups in population, “metro” status of county, whether the county was growing and shrinking, and voting preferences).

Yes, that’s a lot of variables (about 90, plus the county and state dimensions).

Oddly enough, there wasn’t a population measure in the dataset, and many indicators were available in absolute value only, so I constructed a proxy by dividing two variables on the same subject (something like “number of convenience stores” and “number of convenience stores / capita”).

That enabled me to build indicators per capita for all subjects, so I could see if they were correlated with my obesity rates.

Findings – using Tableau desktop to make sense of the dataset

The indicators which were most correlated with obesity were the income ones, which came as no surprise. All income indicators were also very correlated to each other. In the USA, poverty means having an income below a certain threshold which is defined at the federal level. But in other contexts, poverty is most often defined in relation to the median income (typically, a household is in poverty if its income is below half of the median income), so it can be used to measure inequality of a community, and dispersion of incomes.

As a result, many indicators appear to be correlated with obesity because they are not independent of income. This is the case for instance for most of the policy indicators: if a programme has many recipients in a county, it is because poverty is widespread, so residents are more likely to be affected by obesity. This makes it difficult to measure the impact of the programmes with this dataset. This is also the case, unfortunately, for racial indicators, as most of the counties with a very high black population have a low income.

Diet indicators also appear to be uncorrelated with obesity. This is counter-intuitive – isn’t eating vegetables or fresh farm produce the most certain way to prevent obesity? But one has to remember that this dataset is aggregated at the county level. Just because a county has a high level of, say, fruits consumption per capita doesn’t mean that every household is eating that much. Realistically, consumption will be very dispersed: the households where people cook, which are less likely to be affected by obesity, will buy all the fruits, and those where people don’t cook will simply buy none. Also, just because one buys more vegetable than average doesn’t mean they don’t also buy other, less recommended foodstuff.

The only diet indicator that appear to be somewhat correlated to obesity is the consumption of soft drinks.

When it comes to lifestyle habits, surprisingly, the proportion of households without car and living far from a store – people likely to walk more, so to be healthier – is positively correlated with obesity. This is because counties where this indicator is high are also poorer than average – again, income explains most of this. However, physical activity in general plays a positive role. States where people are most active, such as Colorado, enjoy the lowest obesity figures. In fact, all the counties with less than 15% of obesity are in Colorado.

Finally, pricing didn’t seem to have much impact on neither obesity, nor consumption. Why is that? Economists would call this “low price elasticity”, meaning that price changes do not encourage people to switch products and habits. But there is another explanation. Again, people who can’t cook are not going to buy green vegetables because they are cheaper. Also, consider the tax amount that are applied: no more than 7% in the most aggressive states. Compare that figure to the 400%+ levy that is applied to cigarettes in many countries of the world! Clearly, 4-7% is not strong enough to change habits. However, this money can be used to sponsor programmes that can help people adopt safer behaviors.

What to show? making the visualization

First, I wanted to show all of those findings. If 2 variables that you expect to be correlated (say, consumption of vegetables and obesity) are in fact not correlated, a point is made! But visually, nothing is less interesting than a scatterplot that doesn’t exhibit correlation. It’s just a stupid cloud of dots.

So instead I chose to focus on the correlations I could establish, namely: obesity and income, and obesity and activity. Those are the 2 lower scatterplot of my dashboard. I chose the poverty rate measure, because I’d rather have a trend line going up, than going down.

I duplicated that finding with a bar chart made with median income bins. For each bin (which represent all the counties where the median income fall in that range), I would plot the average obesity rate, and, miracle! this comes up as a markedly decreasing bar chart. Now, this figure doesn’t establish correlation, let alone causality, but it certainly suggests it more efficiently than a scatterplot. Also, it can be doubled as a navigation aide: clicking on a bar would highlight or select the relevant counties.

Finally, I decided to do a map. Well, actually, it was the first thing I had done, but  had second thoughts about it, and eventually I put it in. Why? first, to allow people to look up their county. Technically, my county is Travis county (Austin, TX) and I can find it easily on a map. Less so if I have to look for county names listed in order of any of their indicators. I added a quick filter on county name, for those who’d rather type than look up.

I also wanted to see whether there was a link between geography and obesity. So try the following.

  • Where are the counties with obesity rates less than 15% ? Colorado only.
  • If we raise the threshold a little, we get San Francisco and New York. But until 20%, these counties remain very localized.
  • Likewise, virtually all counties above 35% are in the South – Alabama, Louisianne, Mississipi.

Population also has an importance. The counties with a population above 1m people tend to have lower rates – their citizens also usually have higher incomes.

I decided to zoom the map on the lower 48 by default. It is possible to zoom out to see Alaska and Hawaii, but I don’t think that the advantage of seeing them all the time is greater than the inconvenient of having a smaller view point even if they are not necessary.

Regarding the marks. Originally, I didn’t assign any variable to their size, but then thought that the larger counties (i.e. LA, Harris (Houston), Cook (Chicago) …) were underrepresented. So I assigned my population proxy to size. But then, the density of the marks competed with the intensity of the color, which was attributed to the obesity rate. So I removed that and chose a size so that marks wouldn’t overlap each other too much. Regarding color, I wasn’t happy with the default scale. If I let it as is, it would consider that 12.5%, the minimum value of the dataset, is an extremely low number. But in absolute terms, it’s not. Most developed countries have obesity rates lower than that value at the national level. Japan or Korea are below 4%. So I made the scale start at 0. But I didn’t like the output: the counties with the highest values didn’t stand out. Eventually, I chose a diverging scale, which helped counties with high and low values to be more visible.

I edited an tooltip card for the view. In another version of the dashboard, I had a sheet with numbers written out that would change depending on which country was last brushed. I like the idea that this information can stay on. But I got confused in the configuration of the actions, and couldn’t completely prevent the filter that applied to this sheet to be disabled, sometimes, which caused number for all counties to overlap, and an annoying downtime as that happens. So I made an tooltipinstead. Anyway, it’s easier to format text like this. But the problem is that it can hide a good portion of the dashboard. So I exercised constraint and only chose what I found the 15 or so most relevant variables.

Voilà! that’ s it. I hope you like my dashboard, and I look forward to see the work of others! If you are a contestant, please leave a link to your entry in the comments. Good luck to all!!