A few weeks ago, I started a new blog called The Shape of Data, which will focus on explaining the geometry behind modern data analysis, along the lines of the series of posts I wrote on this blog about a year ago. This involves very basic geometry/topology, so I didn’t think it would appropriate for LDTopology. I will continue to post to LDTopology about pure topology, but today I wanted to write a few words about why I started the new blog and what I hope to accomplish with it.
First, you probably noticed that the title of the new blog is an allusion to/rip off of the title of Jeff Week’s (awesome) book The Shape of Space. This book explains the fundamentals of three-dimensional hyperbolic geometry (a la Thurston) in a way that any reasonably bright adult (and many teenagers) can understand. Today, all the attention on “BIG DATA” makes for a great opportunity to introduce the public (including potential future mathematicians) to the geometry that arises naturally in data analysis. (For example high-dimensional data is the perfect motivation for studying higher dimensional spaces!) With the Shape of Data, I hope to explain the geometry of data analysis at a similar level of difficulty as Jeff Weeks’ book.
As I mentioned in my previous post, I think that the field of topology is moving into a stage (experienced by many fields) where it will be enriched by deep connections to applied mathematics. In addition to analysis, there are now applications of topology to DNA knotting and even robotics. I am not suggesting that every topologist should start working on data analysis – It’s not like all number theorists work on encryption or all analysts work on physics. However, both number theory and analysis have benefited – directly and indirectly – from connections to applied mathematics. I hope that as the role played by topology in data analysis grows, it will lead to both greater public interest (and understanding) of the field, and new and interesting problems to work on.
I also think that the field(s) of data analysis can benefit tremendously from a firmer foundation in geometry and topology. We all know that statistics can very easily mislead. As the data gets more complex, the interpretations rely more and more heavily on geometry (even if they do so only implicitly) and misleading statistics can often be caused by a misuse of geometry. (This may sound crazy, but read my upcoming blog posts if you don’t believe me.) The better that we – academics, experts and the general public – understand the way that data is analyzed, the easier it will be to spot misleading statistics, including when “experts” use statistics to lie. I believe that geometry can foster a very intuitive understanding of data analysis and this is what I hope to demonstrate with my new blog.