The Beautiful Duality of Topological Data Analysis

Topological Data analysis is special, because its methods are both general and precise. Teams that use TDA in their work see the “art of the possible” more broadly and can attack problems that might otherwise be “too hard” using traditional techniques.

Since it is the product of human activity, this data set has some interesting features despite its relative simplicity. If we look at the averages of the 1’s on the left, respectively right, side of the main body of 1’s, we see the following:

Leftside 1

Corresponding to:



Rightside 1

Corresponding to:


Another way to say this is that even though there is an overall consensus as to how a ‘1’ should be written in this sample, there are subgroups within these samples with their own ‘sub-consensus’. That variation can be seen in the average of all the ‘1’ images – the blurs at the top and bottom correspond to these two distinct slopes. (Perhaps the two versions correspond to right and left-handed people?)

For the other numerals the same kinds of variations of occur, although they are not as easy to characterize. Here are the averages across three vertical stripes in the large group of ‘7’ images:




and for two stripes across the 5’s: