In the previous post in my continuing series on the topology of big data, I described how a set of techniques called *persistent homology* can be used to infer the large scale topological structure of a data set. The problem is that while the basic algorithm gives you generators of the homology groups, it doesn’t necessarily give you the nicest/shortest representatives. In this post, I’ll discuss a number of approaches to turning the information from persistent homology into a more concrete representation of the data.

As described last time, the persistent homology algorithm constructs a sequence of simplicial complexes and calculates an algebraic structure called a *bar code* that describes which homology elements are “persistent” through a long range of these complexes. If you pick a long element and a complex that it appears in, then you’re left with the problem of finding a homology cycle in the complex of minimal length/area/volume (depending on the dimension) in an equivalence class.

Zomorodian and Carlsson [1] (PDF) devised an algorithm to approximate a given cycle by finding open covers (in the topological sense) of the complex that reflect the given homology class. (Thanks to Mikael Vejdemo-Johansson for pointing me to this paper in the comments on my previous post.) A number of computer scientists have devised algorithms that are guaranteed to find the minimal cycle in a complex of a certain type (usually a surface). The most general result like this that I’ve found is by Dey, Hirani and Krishnamoorthy [2] (PDF) who solve the problem for any complex in which all the subcomplexes of a certain type have torsion free homology. They translate the problem into a linear programming problem and thus produce a polynomial time algorithm.

Of course, it’s probably more useful in general to have a simple representation for the whole space rather than just the homology generators. Adams, Atanasov and Carlsson [3] describe an algorithm for creating a simplified graph (a one-dimensional skeleton) representing the data based on a persistence complex and a density function. Their one-dimensional skeleton is, roughly, the one defined by the handle decomposition induced by the density function, thought of as a Morse function. (This is probably more closely related to Carlsson and Memoli’s multi-parameter clustering algorithm, which I’ll discuss in a future post.)

But perhaps you want even more than a one-dimensional skeleton. If you can use topological tools such as persistent homology to build a really thorough understanding of the topology of your data (or are willing to guess), you can also use an algorithm called a self organizing map (sometimes called a *Kohonen network*). This approach gets its inspiration from neural networks and has a very similar feel. It works as follows: Choose a simplicial complex whose topology matches the expected topology of the data and map it into the space in which the data is embedded. Then cycle through the data points and have each data point “pull” any nearby vertices of the complex towards it. When any vertex of the complex is moved, the adjacent vertices in the complex are moved in the same direction, but not as far. (This is how the algorithm makes use of the topology of the complex.) In the same way that a neural network (hopefully) converges by flowing along gradients defined by cycling through the data, the self organizing map (hopefully) pulls the complex toward a reasonable approximation of the data. If the complex happens to be a manifold then this is an example of manifold learning.

The self organizing map predates persistent homology, but it seems like the two ideas would be very compatible when used together – persistent homology to determine what topological model to use and the self organizing map to turn the topological model into a geometric model of the data. I don’t know if this has been done anywhere before, but if you do you can let me know in the comments.

Your description of the Dey–Hirani–Krishnamoorthy algorithm reminds me of Danny Calegari’s work on rationality of scl in a free group. The problem there is also to find an optimal representative of a (relative) homology class.

Comment by Henry Wilton — June 25, 2012 @ 4:18 am |

Very interesting. I haven’t looked at that paper. Is he looking for a shortest representative, or does he have a different definition of optimal?

Comment by Jesse Johnson — June 25, 2012 @ 12:31 pm |

It sounds like Danny’s algorithm is to find a minimal surface that demonstrates that a loop is homology trivial (or rather some power of the loop), rather than to find a representative for the loop. But I guess that’s equivalent to finding a minimal area representative for the two-dimensional relative homology class since the cut and past operations preserve the class. Is that correct?

Comment by Jesse Johnson — June 26, 2012 @ 1:51 pm

The basic idea is to maximise Euler characteristic.

The topological set-up is as follows (there’s an equivalent group-theoretic definition; scl stands for

stable commutator length). Let be a complex and a 1-boundary. An oriented surface is calledadmissibleif for $n_S$ an integer. Nowwhere is the sum of the Euler characteristics of the components of with negative Euler characteristic. An admissible surface is called

extremalif it realizes scl of its boundary.When is a free group, Danny proves that is always rational, by proving that extremal surfaces always exist. IIRC, his proof proceeds by showing that any admissible surface can be compressed and boundary compressed to one that can be built out of one of a finite number of pieces. The set of weights on these pieces that satisfy suitable gluing equations forms a rational convex polyhedron, on which is a linear function. So the problem is reduced to linear programming.

As an upshot, not only is scl rational, but there is a polynomial-time algorithm for computing scl, which has been implemented and used to great effect by Danny and his student Alden Walker.

Comment by Henry Wilton — June 25, 2012 @ 3:12 pm