Is information geometric, or is it fundamentally topological?
Information theory is a big, amorphous, multidisciplinary field which brings together mathematics, engineering, and computer science. It studies information, which typically manifests itself mathematically via various flavours of entropy. Another side of information theory is algorithmic information theory, which centers around notions of complexity. The mathematics of information theory tends to be analytic. Differential geometry plays a major role. Fisher information treats information as a geometric quantity, studying it by studying the curvature of a statistical manifold. The subfield of information theory centred around this worldview is known as information geometry.
But Avishy Carmi and I believe that information geometry is fundamentally topological. Geometrization shows us that the essential geometry of a closed 3-manifold is captured by its topology; analogously we believe that fundamental aspects of information geometry ought to be captured topologically. Not by the topology of the statistical manifold, perhaps, but rather by the topology of tangle machines, which is quite similar to the topology of tangles or of virtual tangles.
We have recently uploaded two preprints to ArXiv in which we define tangle machines and some of their topological invariants:
Our terminology is classical computer-science inspired- the term “tangle machine” imitates “Turing machine”, our connected components are “processes”, our strands are “registers”, and our crossings are “interactions”. Tangle machines are envisioned as a diagrammatic calculus for information theory, in a big amorphous multidisciplinary sense, which capture an underlying topological nature to information manipulation and transfer.
In what sense is information topological?
Information manipulation should fundamentally be causal, by which I mean that one unit of information causes another unit of information to change (we’ll call it’s updated state ). By how much? That depends on your method of measurement. In what direction? That depends on your chosen (perhaps arbitrary) system of coordinates. But the plain fact of causation, that causes to change to , doesn’t depend on any of that. I’d like to draw such an interaction as a crossing:
Note: Statistics gives us the tools to detect such causal interactions inside real-world data, in which one piece of information triggers a transition between two pieces of information. This means we can actually detect tangle machines inside e.g. Google Trends search data! As a single-interaction example, given graphs of number of searches and nothing else, we can detect with statistical significance that iPhone 5 caused Samsung to update from S2 to S3. Some graphics for another detection example are given below.
Our information is in the form of colours on strands (i.e. in registers). For example, each strand might be coloured by a real number representing the entropy of a `typical sequence’ of zeros and ones. I’m imagining `information’ sitting as colours on each of the strands, with each crossing representing an information fusion or its converse.
Note also that information plays a dual role e.g. in the classical paradigms of computing, such as a universal Turing machine. On the one hand, information is something that is manipulated by a computer, such as the input or the output of a computation. Such information is called a patient. On the other hand, the computer programme that does the manipulation is itself information. Information in this capacity is called an agent. A computer programme can modify another computer programme, so that an agent in one context may be a patient in another context. A labeled digraph (such are the classical diagrammatic languages for such things) does not capture this dual nature of information. But a strand in a tangle diagram may be an overstrand mediating between an input understrand and an output understrand in one crossing, and it may be an understrand itself in another crossing.
We claim that interactions, i.e. crossings, satisfy the Reidemeister relations, and that these represent fundamental properties of information fusion. Indeed, Reidemeister 1 tells us that information cannot generate new information from itself and so, for example, that the entropy of a closed system cannot drop, and in fact can’t increase either, without outside intervention. This seems to contradict the second law of thermodynamics, but thanks to e.g. the Poincaré recurrence theorem I think it’s actually fine.
Reidemeister 2 is what information theorists call `causal invertibility’, telling us that we can recover the input from the output and the agent , and that updating and then discounting by - adding information and then taking it away- brings us back to where we started as though we had done nothing.
And Reidemeister 3, which comes from distributivity, tells us that if we find a common cause for an interaction in which causes to change to , then that doesn’t change our causal relationships: still causes to change to . In information theory, this is equivalent to no double counting. If we update by then we obtain , so is counted towards the result `just once’.
One fun spinoff of this approach is that several classical information theoretical algorithms, such as Kalman filtering and covariance intersection, can be developed using Reidemeister move invariance for suitable choices of what we mean precisely when we say `information’. We can imagine a little Kalman filter fusing and discounting estimators, or a little covariance intersection, sitting at each crossing.
So what does this all give us? We now have a coordinate free `topological’ language with which to discuss fusion and discounting of information. Moreover, we can describe the same network in many different ways, which differ by finite sequences of Reidemeister moves. Different equivalent tangle machines may have different local performance- to fuse and then to discount may consume time and resources, although topologically we’ve done nothing. So tangle machines become a formalism for choosing between different ways to realize `the same’ network of information manipulation.
These ideas become quite concrete in the quantum physical context of adiabatic quantum computation (this is our Section 5.2). Here, the colours represent Hamiltonians, and the interaction is of the form , where . As we move from to (this is the computation), evolves from to . Concatenate many such interactions, and the machine describes a controlled evolution (quantum annealing) of an initial groundstate of a Hamiltonian towards a final state, where we are forcing the Hamiltonian to pass through each state described by a groundstate of a Hamiltonian which overcrosses its strand. The effect may be to speed up adiabatic quantum computations! Farhi et.al. have a recent preprint in which they discuss such speedups by `inserting intermediate Hamiltonians’, and it seems to be known that you can speed up classical quantum algorithms such as the Grover algorithm in this way. Essentially, the idea is that the speed of the computation is inversely proportional to the minimum distance between the lowest two eigenvalues of the Hamiltonian along the evolution path. The `straight line annealing’ of `classical adiabatic quantum computing’ is like walking in a straight line between two points on hilly ground- you might have to climb over hills and down gullies, and, despite being a straight line, it may be a strenuous path. I wouldn’t necessarily want to hike between two peaks of a high mountains by going in a straight line! Tangle machines give a diagrammatic language to describe the process of choosing the traverse.
From another perspective, information manipulation might be just another word for computation. The word `computation’ is a charged word, which, like `information’, doesn’t have clear mathematical meaning. One way to ascribe the word `computation’ mathematical meaning would be to define computation as the operation of a Turing machine. Can a tangle machine simulate a Turing machine?
Let’s define the computation of a tangle machine to be `input colours into a set of registers , and read off colours from another set of registers ‘. Assume that the colours of uniquely determine the colours of .
This notion of computation, unlike many others, makes no mention of the notion of `time’.
Let’s choose our set of colours to be , , and , represented as red, green, and blue correspondingly. A colour acts trivially on itself, but switches any colour other than itself- this is a Fox 3-colouring.
We first simulate a NOT gate, exchanging and . The arrow on the left is the input, and on the right is the output. The strand without the arrow is fixed at , and is neither input nor output, but is merely part of the gate.
We next simulate a multiplexer, which copies a register labeled of . This gate has one input and two outputs.
To simulate an AND gate, we need one more piece (and its inverse). This is a trivalent vertex which accepts two colours as input, and outputs their minimum. This sends , , and , to zero, and it sends to one.
And that’s it! With an AND gate, a NOT gate, and a multiplexer, we can compute any recursively computable function, and tangle machines (in this new extended sense, which I haven’t properly defined) are Turing complete.
I haven’t explicitly written down the Reidemeister moves that such a machine satisfies in this blog post.
Tangle machines can further simulate a Turing machine, tape and all, and can further simulate neural networks. If we allow the overcrossing colour to also be updated, so our colour set is not a quandle but is instead a biquandle, we can also simulate machine learning.
Besides our approach, there are other quite different approaches which relate low dimensional topology with the theory of computation (although I don’t know other works relating low dimensional topology with information theory). One approach is to view the tangle itself as a unit of data, and to compute by applying rewrite rules. This approach originates with Kauffman, who is I think the father of applying low dimensional topology to computing. It is tremendously exciting, with possible applications in internet architecture and in biology. A recent preprint by Kauffman and Buliga outlines one such idea. Marius Buliga has a very nice research blog in which he explains his research programme. There is also another approach of Kauffman in which colours evolve along a braid. I think that this concept of computation is similar in spirit to ours (with knots instead of with tangle machines), in that his crossings are also performing computations. Meredith and Snyder have an approach in which they encode knot diagrams using Milner’s π calculus, with a view to using process calculi formalisms to find and compute knot invariants. In this approach also, crossings play the role of switches. Then there are the category theory approaches, outlined nicely in Baez and Stay’s survey.
Thanks to Lou Kauffman and to Marius Buliga, for useful feedback regarding tangle machines and computation.
As today there is information geometry, Avishy and I strongly believe that there will be information topology. The low dimensional topology of information.