Dimensionality reduction

10 September, 2012 (11:13) | Uncategorized | By: jerome

Following my Tableau politics contest entry, here is another view I had developed but which I didn’t include in the already full dashboard.

In the main view I have tried to show how the values of candidates relate to those of the French. It’s difficult to convey that graphically when these values are determined by the answers to as many as 19 questions (and there are many many more that could be used to that effect).

Enter a technique called dimensionality reduction. The idea is to turn a dataset with many dimensions into a dataset with much fewer dimensions, as little as one, two or three. So we compute new variables, so that they capture virtually all the variability of the original dataset. In other words, if two records have different values in the original dataset, they should have different values in the transformed dataset too.

If you’re not allergic to words like eigenvalues the math is actually pretty simple. But let’s not go into that. The point is that with this technique you can represent a complex dataset as a two-dimensional dataset with very little loss.

Of course the technique doesn’t tell you what these new variables represent. Getting a feel for the data I postulate that the one on the X axis represents the toughness of a candidate (pro-security measures, no sensitivity for minorities, etc.) and the one on the Y axis is happiness with previous government. Or possibly, lover of the capitalistic doctrine.

Now you get a better feel for how close or distant the various candidates are from individual voters. You can also see which “spaces” remain empty or which are competitive. The top-right quadrant, for instance, looks tempting, but it is really nearly empty (about 200 respondents on over 1500). The right half of the matrix, that is the one which is sensitive to strength, has only one possible competitor but also few voters (~400 respondents). It makes more sense to remain an acceptable choice for the top half (750 voters) and especially the top left (550). In other words the Sarkozy mark should drift slightly towards the top left for optimal impact.


Comment from Ross A. Le Grande
Time July 13, 2013 at 7:25 pm


Could you e-mail me a copy of the above Dashboard? I am looking into using Principal Component Analysis (PCA) in Tableau and would like to see how you entered the math into Tableau.



Write a comment