A few weeks ago, I started a new blog called The Shape of Data, which will focus on explaining the geometry behind modern data analysis, along the lines of the series of posts I wrote on this blog about a year ago. This involves very basic geometry/topology, so I didn’t think it would appropriate for LDTopology. I will continue to post to LDTopology about pure topology, but today I wanted to write a few words about why I started the new blog and what I hope to accomplish with it.

First, you probably noticed that the title of the new blog is an allusion to/rip off of the title of Jeff Week’s (awesome) book *The Shape of Space*. This book explains the fundamentals of three-dimensional hyperbolic geometry (*a la* Thurston) in a way that any reasonably bright adult (and many teenagers) can understand. Today, all the attention on “BIG DATA” makes for a great opportunity to introduce the public (including potential future mathematicians) to the geometry that arises naturally in data analysis. (For example high-dimensional data is the perfect motivation for studying higher dimensional spaces!) With the Shape of Data, I hope to explain the geometry of data analysis at a similar level of difficulty as Jeff Weeks’ book.

As I mentioned in my previous post, I think that the field of topology is moving into a stage (experienced by many fields) where it will be enriched by deep connections to applied mathematics. In addition to analysis, there are now applications of topology to DNA knotting and even robotics. I am not suggesting that every topologist should start working on data analysis – It’s not like all number theorists work on encryption or all analysts work on physics. However, both number theory and analysis have benefited – directly and indirectly – from connections to applied mathematics. I hope that as the role played by topology in data analysis grows, it will lead to both greater public interest (and understanding) of the field, and new and interesting problems to work on.

I also think that the field(s) of data analysis can benefit tremendously from a firmer foundation in geometry and topology. We all know that statistics can very easily mislead. As the data gets more complex, the interpretations rely more and more heavily on geometry (even if they do so only implicitly) and misleading statistics can often be caused by a misuse of geometry. (This may sound crazy, but read my upcoming blog posts if you don’t believe me.) The better that we – academics, experts and the general public – understand the way that data is analyzed, the easier it will be to spot misleading statistics, including when “experts” use statistics to lie. I believe that geometry can foster a very intuitive understanding of data analysis and this is what I hope to demonstrate with my new blog.

Hey Jesse,

I recently gave a talk to a group of statisticians about what topologists are up to in this area. The way I put it was that most of the foundational problems in topology have largely been solved. So topologists are trying to find issues of the same magnitude and spirit to focus on.

I think much of what spurred on topology in the beginning as “the manifold problem”. The object that has no local information, everything is global. Mathematics had little in the way of dealing with such concepts, so topology had to manufacture them (and in doing so, manufacture itself!). In the end there was a great set of tools created to go from the homogeneous to the specific.

Topological data analysis is essentially the reverse problem. You have loads of specific data, and you want to find the general trend that it fits into — one could imagine hoping to write computer software that would automatically deduce physical law from raw data. Inference machines, that kind of thing. In a sense this is the opposite direction of the same issues that got topology off the ground in the beginning.

It’s also close in spirit some some of the remaining big open problems in topology. I’d count finding an algorithmic Ricci flow on 3-manifolds as one of the big open problems in topology. This again has a “go from the specific to the general” feel to it.

Comment by Ryan Budney — March 21, 2013 @ 2:23 pm |

That’s a great point. I hadn’t thought of it from that perspective, but you’re absolutely right. It’s similar to the notion of inverse problems in PDEs. Hopefully the tools for going from homogeneous to specific will prove to be a good starting point for going from specific to homogeneous.

Comment by Jesse Johnson — March 21, 2013 @ 4:37 pm |

[...] The Shape of Data (ldtopology.wordpress.com) [...]

Pingback by INSTEAD of PROSE: Software-Aided Thinking finds many Expressions | — March 30, 2013 @ 4:45 pm |

I agree very much with your assessment about the potential of LDtopology/geometry for applied data analysis. Especially, I think 3-manifolds with metric structures may be a source of interesting spaces to simulate dynamics systems and to visualize high dimensional data. I have just started to learn 3 manifolds, and come to your interesting blog. Unfortunately, there are not so many intuitive treatments about those topics. I am looking forwards to more posts in LD-Toplogy and “The share of data”.

Comment by James Li — May 16, 2013 @ 5:38 pm |

Just wanted to mention a nice little application of low-dimensional topology (Thurston-Nielsen classificaiton and braid theory) that I have had the chance to work in. There have been a series of papers starting with Boyland,Aref,Stremler (Journal of Fluid Mechanics 403 (2000)) where topological concepts have been applied successfully. This program has now been extended quite a bit (e.g. http://arxiv.org/abs/1206.2321), and led to a topological analysis of data from geophysical phenonmenon by other authors (http://arxiv.org/abs/1106.2231).

Comment by nonlinearism — August 8, 2013 @ 11:19 pm |