I’m excited by a number of new and semi-new papers by Greg Kuperberg and collaborators. From my point of view, the most interesting of all is:

G. Kuperberg,

Algorithmic homeomorphism of -manifolds as a corollary of geometrization, http://front.math.ucdavis.edu/1508.06720

This paper contains two results:

1) That Geometrization implies that there exists a recursive algorithm to determine whether two closed oriented –manifolds are homeomorphic.

2) Result (1), except with the words “elementary recursive” replacing the words “recursive”.

Result (1) is sort-of a well-known folklore theorem, and is essentially due to Riley and Thurston (with lots of subsections of it obtaining newer fancier proofs in the interim), but no full self-contained proof had appeared for it in one place until now. It’s great to have one- moreover, a proof which uses only the tools that were available in the 1970’s.

Knowing that we have a recursive algorithm, the immediate and important question is the complexity class of the best algorithm. Kuperberg has provided a worst-case bound, but “elementary recursive” is a generous computational class. The real question I think, and one that is asked at the end of the paper, is where exactly the homeomorphism problem falls on the heirarchy of complexity classes:

And whether the corresponding result holds for compact –manifolds with boundary, and for non-orientable –manifolds.

]]>

In “A Mathematician’s Apology”, published in 1940, G. H. Hardy argued that the study of pure mathematics could be justified entirely by its aesthetic value, independent of any applications. (He used the word “apology” in the sense of Plato’s Apology, i.e. a defense.) Of course, Hardy never had to apply for an NSF grant and his relatives probably never asked him why someone would pay him to solve problems without applications.

In the following decades, mathematics helped win the Second World War and send astronauts to the moon. Many mathematicians began to justify their work in abstract research by pointing to examples such as number theory in cryptography, where ideas from abstract mathematics that were developed based on aesthetics proved to be unexpectedly useful for real world problems. In the 1960s, as baby boomers headed off to college and PhD programs struggled to keep up with the need for new faculty members, some mathematicians began to argue that teachers who were involved in active research would be better equipped to teach students how to think mathematically.

But today, now that graduate programs produce more PhDs than can fill the available research and teaching positions, the reality has set in that most mathematics PhD students will not go on to careers that involve teaching, let alone abstract research. Moreover, the economic slowdown that followed the post-war boom has made it harder for governments to justify investments, whether in the form of grants or tenure lines, for research whose value won’t be apparent for decades or even centuries.

So the mathematics community faces a choice: either accept the new reality by cutting back PhD programs or rethink the way that abstract mathematics should fit into society.

In this post, I will argue that by changing the way we justify mathematics research and the ways we think about the role of the research community in the wider world, we can sustain or even increase graduate programs and research funding without changing our core values or the fundamentals of graduate education. I won’t attempt to distinguish between “pure” and “applied” mathematics. The term abstract research is intended to imply both. I will argue three points:

- The background one gets from a graduate degree in abstract mathematics is extremely, and increasingly, valuable in a wide range of non-academic careers, beyond the stereotypical security/military and financial sectors.
- The value of this background comes from time spent working within a large, active, academic community engaged in abstract research and is much greater than the external value of the research itself.
- Embracing this perspective will not cause a massive exodus of mathematicians from academia, but will instead cause an increase in the number and diversity of students entering graduate programs.

This perspective argues that students leaving academia for industry are the most valuable contribution that math PhD programs make to the rest of society. The changes the community would need to make in order to embrace this new perspective are not simple or easy, but they are mostly peripheral. The value of a background in mathematics comes from the way that students currently learn the ideas, research practices and thought processes. The required changes have to do with the way we recognize this value: The ways we talk to students about potential careers, the ways that we approach professional development and the ways that we talk to each other and to non-mathematicians about how our research fits into the rest of the world.

In particular, when we discuss the external value of mathematics research, we should de-emphasize the theorems we prove, and focus on the diversity of perspective that members of the research community bring to non-academic organizations. A great deal of research in the past few years has demonstrated the value of diversity in teams, and while most of the discussion has focused on ethnic and gender diversity, the same principle applies to intellectual diversity. In fact, a major benefit of ethnic and gender diversity is that it’s a proxy for diversity of perspective. Similarly, the perspective that one forms from engaging in mathematics research can be invaluable to a team of mostly non-mathematicians, not because it’s objectively better than any other perspective, but because it’s different.

While it can be hard to pin down exactly what makes a mathematical perspective different, here’s a partial list. Mathematicians are not the only people who can do these things, but engaging in abstract mathematics research trains students to do them well:

- Thinking at and between different levels of abstraction: Understanding how axioms fit together to form lemmas, then theorems, is good practice for understanding other complex systems that are too large to see all at once.
- Boiling systems down to their essentials: Abstracting systems into definitions and axioms requires determining what’s fundamental and what’s peripheral.
- Discovering parallels between unrelated systems: Solving a problem by transforming it into a previously-solved problem works in the real world too.

When combined with a bit of domain knowledge, these skills can be used to translate vague intuition into precise and usable statements, incorporate ideas from a range of perspectives and terminologies into a cohesive system and create a scaffolding that allows a team to reason about a complex system. Acquiring the domain knowledge that makes this possible is non-trivial – at the very least, it requires a number of years of working outside academia – but the mathematical perspective makes it an order of magnitude more powerful.

One can’t form a mathematical perspective from books and lectures alone. It can only come from working on abstract problems within an active research community. For a graduate student, the research problem is a lens that brings all the tools and problems of mathematics into sharp focus. Every conversation with another mathematician becomes a chance to learn how they would approach the problem. Every new idea must be understood well enough to determine whether it can be applied to help solve the problem.

These ideas, from across mathematics, are abstracted from problems in hundreds of other fields, and bring with them artifacts of the thought processes that spawned them. And while many ideas get written into papers and books, the folklore and meta-ideas that surround them are, arguably, much more important. They make the research community a living entity, and while individual mathematicians may turn coffee into theorems, it is the community that turns students into mathematicians.

Meanwhile, students are increasingly aware of the problems with the academic job market. Many promising undergraduates who love the subject never apply to graduate school because they don’t want to become a professor or don’t think they have what it takes. Moreover, women and members of underrepresented group are much more likely to make such a decision because they’re more likely to perceive that the cards are stacked against them. If they never enter graduate school, we never get the chance to convince them that they can make valuable contributions to the mathematics community.

If attitudes change so that an academic career is seen as one of many acceptable career paths that a math PhD can lead to, the research community might lose a few would-be professors, but more importantly, it will gain graduate students who love the subject more than the career path. These students will bring a much broader diversity of background and interests, which will enrich the community with new ideas. Already, many PhD graduates choose non-academic careers late in the process, after they discover how limited their academic career options are. If they can make these decisions sooner, it will benefit them individually and the math community as a whole.

Changing the way the research community thinks about career paths and its relationship with the outside world will not be simple or easy, and a prescription for such change is far beyond the scope of this post. However, they won’t require changing the fundamentals of how we create and teach mathematics, since these are the things that make a math research background so valuable. We should help students to learn about non-academic careers and how mathematicians can fit into them. We should not push students into applied math and statistics courses or make them into “data scientists”. The value that a mathematician brings to a non-academic career comes from engaging with the mathematics community on an abstract dissertation problem. Today, the math PhDs who follow non-academic career paths are individually demonstrating that value. All we, as a community, need to do is find better ways to recognize it.

]]>

A new book has just come out, and it’s very good.

Office Hours with a Geometric Group Theorist, Edited by Matt Clay & Dan Margalit, 2017.

An undergraduate student walks into the office of a geometric group theorist, curious about the subject and perhaps looking for a senior thesis topic. The researcher pitches their favourite sub-topic to the student in a single “office hour”.

The book collects together 16 independent such “office hours”, plus two introductory office hours by the editors (Matt Clay and Dan Margalit) to get the student off the ground.

Given the number of authors and the variety of concepts that are presented, trying to assemble such a book would seem a recipe for disaster, but the actual result is a resounding success! The level never flies off into the stratosphere and never becomes patronizingly oversimplified – each office hour is at the right level, and the tone remains informal without being wishy-washy. As the researcher is aiming to hook students on their topic, each office hour provides a nice entry point into its topic, with “next steps” mapped out to help the student on their way.

The voice of the researcher is preserved, which is also nice. Aaron Abrams informs us that Thalia’s hair (presumably his daughter) and challah are both braided, and Johanna Mangahas explains the Ping-pong Lemma using ping-pong.

The greatest highlight of the book is perhaps the exercises, which are pitched at a good introductory level and help the student wrap their brains around the topic.

I think that the book isn’t only a collection of good lead-ins for undergraduates- these “office hours” are equally useful for graduate students and for mathematicians who don’t happen to specialize in those fields, but who want to sightsee some key ideas quickly.

I very highly recommend it!!

]]>

Mirror symmetry is a physical idea that relates two classes of problems:

**A-Model:**Measurement of a “volume” of a moduli space. In particular, counting the number of points of a moduli space that is a finite set of points.**B-Model:**Computation of matrix integrals.

We may think of the A-model as “combinatorics and geometry” and of the B-model as “complex analysis”. Why might relating these classes of problems be important?

- Mirror symmetry might help us to compute a quantity of interest that we would not otherwise know how to compute. Sometimes enumeration may be simpler (e.g. the Argument Principle) and sometimes complex analysis may be simpler (when integrating by parts is easier than counting bijections).
- An object in one model may readily admit an interpretation, whereas its mirror dual’s meaning may be a mystery. This is the case in quantum topology- quantum invariants, which live on the B-model side, are powerful, but their topological meaning is a mystery. On the other hand, the A-model invariants (hyperbolic volume, A-polynomial) have readily understood geometric/topological meaning.

Mirror symmetry (as currently understood) doesn’t in-fact directly solve either problem, but it does provide heuristics. There is no known formula to compute the mirror dual problem to a given problem- mirror duals in mathematics have tended to be noticed post-facto. Mirror symmetry is also not mathematically rigourous, so each prediction of mirror symmetry must be carefully analyzed and proven. In addition, the mathematical meaning of mirror symmetry is unclear.

Despite this, quantum topology has received a number of Fields medals for work in and around mirror symmetry, including Jones (1990), Witten (1990), Kontsevich (1998), and Mirzakhani (2014). Several of our most celebrated conjectures, such as the AJ conjecture relating a quantum invariant to a classical invariant, stem from it.

Topological recursion observes that all known B-model duals of A-model problems can be framed in a common way (a holomorphic Lagrangian immersion of an open Riemann surface in the contangent bundle with some extra structure). This was observed first in special cases, and then it was noticed that the picture generalizes. Topological recursion thus reveals a common framework to all known mathematical examples of mirror symmetry. This simplifies B-model duals to A-model problems and places them in a common framework (a-priori they are complex integrals with a lot of variables without much else in common). It also provides tools to prove mirror duality in special cases. Explicitly, all of the information of an a-priori complicated mirror dual can be recovered from an embedded open Riemann surface (plus some extra structure), whose information is again encapsulated via an explicit formula in information in lower genus surfaces. Together with Mariño’s Remodeling Conjecture, we can say that topological recursion “tidies up” the B-model side of mirror symmetry, and elucidates what it means for something to be a “B-model dual” do an “A-model problem”.

One insight which topological recursion provides is that many of the simplest cases of mirror symmetry are Laplace transforms. Perhaps this is a window to understanding mirror symmetry itself? An vague conjecture along the lines of “in some contexts, mirror symmetry and the Laplace transform are the same thing in disguise” is given by Dumitrescu, Mulase, Safnuk, and Sorkin. For quantum topologists, another insight provided by topological recursion is that it suggests ways of reframing our favourite quantum invariants, such as the Jones polynomial, as objects which have more ready topological meaning, such as tau functions of integrable systems.

So, in conclusion, topological recursion provides a common framework for B-model objects such as quantum invariants. The hope is that this will elucidate their meaning and facilitate proving their mirror duality to better-understood mathematical objects. It does this by tidying up the B-model side into something structured which begins to look tractable.

Topological recursion has already led to several breakthroughs, including the simplest known proof of Witten’s conjecture and of Mirzakhani’s recurrence, and the subject is still in its infancy. It fits well with what we know by recovering all the “right” invariants at low orders (hyperbolic volume, analytic torsion) and hitting some heuristically expected keywords (e.g. ). Topological recursion is white-hot at the moment.

Disclaimer: I’m not an expert and some things I said might be wrong- please correct mistakes, inaccuracies, and omissions in the comments!

]]>

Step 1: Diff(S^2) has the homotopy-type of O_3 x Diff(D^2,S^1). The latter object here is the group of diffeomorphisms of the 2-disc which are the identity on the boundary.

Step 2: Show Diff(D^2,S^1) is contractible.

Step 1 is a general argument, that Diff(S^n) has the homotopy-type of O_{n+1} times Diff(D^n, S^{n-1}), the proof of which is very much in the spirit of the isotopy extension theorem, and the classification of tubular neighbourhood theorem, but `with parameters’.

Step 2 is a rather specific argument, which, at its core involves the meatiest theorem on our understanding of first-order ODEs in the plane: the Poincare-Bendixson theorem. His clever application of Poincare-Bendixson theorem allows him to reduce the proof to the theorem that Diff(D^1,S^0) is contractible, which has many simple and elegant proofs.

Smale’s proof has a bit of the spirit of an inductive proof. It leads one to the question, what about the homotopy-type of Diff(S^3)? Perhaps because we can’t imagine anything different, it would make sense for Diff(S^3) to have the homotopy-type of O_4. At the level of path-components this was proven by Cerf in 1968, in one of the first applications of the subject now called Cerf Theory. The full proof by Allen Hatcher was given in 1983. Around this time the problem of showing Diff(S^3) has the homotopy-type of O_4 began to be called “The Smale Conjecture”.

I think it’s fair to say that most major theorems in 3-manifold theory at present have several different proofs (classification Seifert-fibred manifolds with 3 singular fibres over S^2 might be one of the few cases where there is only one proof), or at least, several variations on one proof. But the Smale Conjecture has found no alternate proofs. People have hoped that perhaps a `geometrization with parameters’ theorem could be used on the space of all metrics on S^3, but the metric collapses along families of spheres — this is much like the difficulties Hatcher encounters in his original proof, but Hatcher was just dealing with families of manifolds, while a geometrization proof would be in a category of Riemann manifolds.

Hatcher suggested a possible alternative framework to prove the Smale Conjecture. The idea is to show that the component of the trivial knot, in the space of smooth embeddings Emb(S^1, S^3) has the homotopy-type of the subspace of great circles. Hatcher gave a few other equivalent formulations of the Smale Conjecture — the one he used makes the Smale Conjecture looks like `the Alexander Theorem with parameters’ i.e. that the space of smooth embeddings Emb(S^2, R^3) has the homotopy-type of the subspace of (parametrized) round spheres. Hatcher’s proof is essentially a souped-up version of Alexanders proof; roughly speaking it involves a rather careful cutting of families of spheres into simpler families.

The embedding space Emb(S^1, S^3) has been studied in many ways over the years. Jun O’Hara had the idea of putting a “potential function” on this space, much in the spirit of Morse theory. He used a function derived (in spirit) from electrostatics. Imagine the knot as carrying a uniform electric charge along its length and write down the integral for the potential energy of the system. Technically O’Hara allowed for less physically inspired “energies” but this is the basic idea. In the 80’s and 90’s it was proven that for O’Hara’s potential function flow in the negative gradient direction makes sense, and that there are local minimizers in the space. Recently it was proven that a C^1 embedding which is a critical point of this energy functional, is necessarily a C^\infty smooth embedding. So there has been plenty of progress.

Of course, what one would really want to prove is that the only critical points of this functional on the component of the trivial knot are the great circles themselves. That would allow for a Morse-theoretic argument that the unknot component of Emb(S^1, S^3) has the homotopy-type of the great-circle subspace, and give a new, rather appealing proof of the Smale conjecture.

References:

O’Hara. Energy of a knot, Topology, 30 (2): 241–247

Freedman, He, Wang. Möbius energy of knots and unknots, Annals of Mathematics, Second Series, 139 (1): 1–50

He. The Euler-Lagrange equation and heat flow for the Möbius energy. Communications in Pure and Applied Mathematics.

Blatt, Reiter, Schikorrra. Harmonic Analysis Meets Critical Knots. TAMS Vol 368, no 9, sept 2016, pg 6391–6438

]]>

We’re looking toward developing applications, so we’re primarily searching for people who can program and maybe who have some signal processing knowledge. So primarily for computer science postdocs, I suppose.

An official announcement will be posted at relevant places in due time- but you heard it here first (^_^)

]]>

]]>

Chern-Simons theory is concerned with canonically associating functions to representations. In a typical topological context, we would be looking at the moduli space of representations of a knot group into a group such as . The associated functions are topological invariants, and such invariants are of primary interest in quantum topology. Examples of invariants arising as or from such functions are the Jones polynomial and the Alexander polynomial (indirectly; what actually shows up is the square root of analytic torsion).

I find it deeply unsatisfying that Chern-Simons theory for topologists (indeed all of quantum topology) all happens over the complex numbers (some stuff can happen over or , but that’s as far as it goes). There is no conceptual justification for introducing complex numbers that I can see- for technical reasons we just seem to need the complex structure on the moduli space in order to be able to prove anything. There have been various attempts to study quantum topology over other fields or over rings such as the integers, but as far as I know the results are weak. For example, a fundamental result in quantum knot topology, that Vassiliev invariants are uniquely specified by weight systems, is only known over a few fields, as lamented e.g. by Bar-Natan.

In analytic number theory the goal is once again to canonically associate functions to representations. The parallel nature of the task is striking- the role of the knot complement is taken by an arithmetic scheme , and the role of the group is played by a so-called motivic sheaf which is uniquely built up from a representation of the arithmetic fundamental group of . On the other side, the properties the canonical functions must satisfy nicely parallel those of Chern-Simons theory.

The functions thus constructed, assuming that they exist, are topological invariants of . No general construction for these functions is known, but L-functions, whose constructions have been via ad-hoc methods, are examples. In fact, two major conjectures in analytic number theory, the Iwasawa Main Conjecture and the Hasse-Weil Conjecture, can both be framed as conjectures that such a canonical assigment of functions to representations exists.

I ought to mention parenthetically that quantum topological analytic number theory has already happened, when Le and Murakami used the Kontsevich invariant to discover relations between multiple-zeta values that had not been known previously, which analytic number theorists have since assimilated (see this recent survey by Furusho).

Arithmetic Chern-Simons promises to be exciting for both communities. For topologists, we may dream of a more flexible version of Chern-Simons theory which works over more general rings than just the complex numbers (although sadly we have a dearth of good conjectures in this direction at the moment). For number theorists, perhaps quantum topology can provide ideas to help attack some conjectures of interest. One may fantasize that, by way of these goals, we will gain an understanding of how knots and 3-manifolds fit into the main body of the big picture of mathematics, and how they might act special model cases not only in topology, but perhaps also in number theory.

]]>

A. Hope Jahren, She Wanted to Do Her Research. He Wanted to Talk ‘Feelings.’, New York Times, March 4, 2016.

What makes this piece especially interesting for me is that it’s written so that one understands the harasser, and is made to realize that “it could be me”. The pattern she describes sounds more common than one might like to admit- and the person writing the e-mail would almost certainly not be cogniscent of it being harassment. A male TA, professor, or supervisor, using the excuse of an altered state of mind (haven’t slept, drank too much) e-mails a love confession to a female student or colleague in a way that blames her, is a total power play, and is creepy and maybe a bit threatening (although of course he doesn’t see it that way). A wrong response to this first e-mail might mean that the victim gets harassed for a long time.

The author says that this first e-mail must be answered by firmly telling him (not asking him) to stop. But, Jahren laments, it never, never stops. While surely Jahren’s suggestion is sensible, a firm, “Dude, I have zero romantic interest in you. In addition you might want to read this piece by Jahren,” might, I think, be even more effective.

What do you all think? How prevalent is this type of sexual harassment in mathematics, and what can be done to effectively nip such harassment patterns in the bud?

]]>

A.Y. Carmi and D.M.,

Statistics Limits Nonlocality, arXiv:1507.07514.

It offers a statistical explanation for a Physics inequality called Tsirelson’s bound (perhaps to be compared to a known explanation called Information Causality). Behind the fold I will sketch how it works.

A *binary channel* is a pair of Bernoulli ( and valued) random variables and representing *input* and *output* together with a conditional probability function representing *noise*. A channel is typically described by telling a story about how is constructed from and some additional random resources; but mathematically it’s really just the conditional probability function.

Usually it is a realization of , *i.e.* a zero or a one, that is the message we would like to send through the channel. So the random variables of channels usually represent distributions of a realization. But I’d like to consider a different setting, in which the message through the channel is all of . In other words, the message is the real number . The parameter contains an infinite amount of information (all values in its binary expansion, for instance), as opposed to the content of a sample that is one bit. So Bob’s “task” is to estimate the parameter to the best of his ability. To do this, he is allowed to sample a predetermined number of times.

I would like to partition what may happen into three (realistic) cases:

- There is no channel between and because . A fortiori, a finite number of samples of tell us nothing about .
- There is a channel between and ,
*i.e.*, but what is being broadcast through the channel cannot be distinguished from noise. More precisely, consider Fisher Information that is a mathematical quantity measuring how much samples of a random variable tells us about a parameter. It measures this via the Cramér-Rao Theorem, which tells us that the variance of any estimate which Bob can construct of based on the information at his disposal is bounded from below by one over the Fisher information. Our -valued random variables have variance bounded above by (the variance of a Bernoulli random variable is whose maximum us at ), therefore Fisher information of under is `no information’. Thus Bob would learn just the same about Alice’s variable by tossing a fair coin as he would learn by listening to the output of the channel. - Alice and Bob are communicating!

I would like to draw your attention to the second case, in which there is a channel but the information broadcast through the channel is indistinguishable from noise. The situation is analogous to a long game of Chinese whispers, in which one person whispers a message to another until the final person announces the message to the entire group. A massive such game played in 2012 resulted in “Alice’s” message “Life must be lived as play” (a paraphrase of a quote from Plato) being relayed to “Bob” as “He bites snails”. In a long enough game, with probability one, Bob will receive only noise despite a channel undeniably existing.

In a certain context, Physicists refer to Case I (nonexistence of a channel) as “Locality”, in that Alice and Bob are effectively isolated from one another. But I think that Case II is also “Locality” according to my intuitive understanding of the term. If a tree falls in a forest and no one is around to hear, does it make a sound? If a sample of cannot be used to analyze , in what sense is it paradoxical that and are dependent?

But the word “Locality” is taken to refer to Case I, therefore I’ll refer to Cases I and II together (in the physical context I’m just about to describe) as “Information Locality”.

In Newtonian Mechanics an object can only be in one place at one time. An arresting feature of Quantum Mechanics is there is a sense in which an object can be located in two places at once. More precisely:

Nonlocality:A pair of quantum systems which are shown not to be physically interacting may be impossible to describe as independent entities.

Such a pair of unseparable quantum systems perforce must be described as one system system which is in two places at once. The archetype of nonlocality is a pair of distant agents Alice and Bob each of whom hold one half of a singlet. A measurement performed on Alice’s particle appears to have an instantaneous effect on Bob’s particle and vice versa. The strength of this perceived effect is quantified by a real number called the *Bell-CHSH correlation*. If (“Bell’s Inequality”) then we are in a *local* setting, and Alice’s system may be fully described independently of Bob’s system, and these two systems fully describe the joint system. Bell’s Theorem tells us that Alice and Bob’s halves can no longer be described as independent entities governed exclusively by local influences when exceeds .

Bell’s Theorem is proved using only Probability Theory and as such is independent of the functional analysis formalism of Quantum Mechanics. Why is this important? Besides aesthetic considerations, reliance on Probability Theory alone is good in the context of a search for a Grand Unified Theory to unify Quantum Mechanics with General Relativity. But the mathematical formalism of Quantum Mechanics (functional analysis) is different from the mathematical formalism of General Relativity (differential geometry). Thus, we would expect a grand unified theory to be described by a mathematical formalism which envelopes both of these formalisms and more, and in particular we would not expect it to be based on functional analysis.

Bell’s Inequality is indeed violated experimentally. Nonlocality is real. Newtonian mechanics alone cannot describe the quantum world.

How large can be?

Within the Hilbert-space formalism of quantum mechanics, Tsirelson showed that . Tsirelson’s bound is supported experimentally.

We would love to understand Tsirelson’s bound in a broader context (*e.g.* probability or statistics), so that the same upper bound on continues to hold if and when the functional analytic formalism of Quantum Mechanics is replaced by a more abstract language.

The basic building block of the so-called “context-free approach” to nonlocality is a pair of boxes, one held by Alice and one by Bob. These boxes abstract the notion of entangled particles. Into each box you can insert either a zero or a one, and the box responds by instantaneously spitting out either a zero or a one. Call Alice’s box input and her box output , and call Bob’s box input and his box output . We assume various marginals such as and to be random variables.

The Bell-CHSH correlation is now defined as the conditional probability

Thus, defines a binary channel from to . Addition and multiplication are modulo . This is kind-of weird but also kind-of cool- the channel in the Bell-CHSH setting isn’t between it’s “Alice” and its “Bob”, but rather between the product of Alice and Bob’s inputs and the sum of their outputs.

Having noted that can be used to define a channel, we can generalize to the case of multiple boxes. Each one of Alice’s boxes has a corresponding box on Bob’s side, and the coordination strength between the pair of boxes is quantified by .

The classical protocol for multiple boxes is called an “oblivious transfer” and is detailed in a paper by van Dam. Alice and Bob each hold in front of them an infinite family of boxes, such that each box of Alice’s is correlated with a box of Bob’s with Bell-CHSH parameter (the same for each matching pair of boxes). Alice holds an information source which is a Bernoulli random variable with mean . We imagine as encoding a message, perhaps in the digits of its binary expansion (because it’s a real number, it contains infinite information). Alice independently samples values from (the interesting case is in the limit ).

We specialize to the case . Using the oblivious transfer protocol which takes advantage of the full power of Alice and Bob’s boxes, we compress into a single bit which Alice send through a channel to Bob who recieves it as . Using his boxes, Bob decompresses the bit he receives into which are also independent identically distributed (iid) and which we may consider as realizations of a Bernoulli random variable whose mean is their sample average (the variable depends on but we suppress this from the notation). We now have a noisy channel with input and with output .

Almost all of our actual work was figuring out the reformulation above- with everything well-defined and phrased in terms of channels, the computations are routine.

A quick computation shows that . Thus the channel between and disconnects in in limit. Conversely, the Fisher information about in is computed to be . This terms stays between zero and one (one instead of as above because our random variables are -valued) for all only when Tsirelson’s bound holds.

[Note that we’re assuming that in the above formulae, which is essentially a technicality.]

In other words, within the context of the oblivious transfer protocol, Tsirelson’s bound is interpreted as a necessary and sufficient condition for information locality. This interprets Tsirelson’s bound entirely in terms of statistics.

Note that if Tsirelson’s bound were violated we would have a strange “Case 4” (which is actually Case I and Case III at once) in the limit in our 3-way division. Namely, in this limit the channel would disconnect but Bob would nevertheless receive full information of . We suggest that such a case ought not to occur in the real world.

Informally, our result states that Bob may infer nontrivial information about if and only if Tsirelson’s bound is violated. A thought experiment which sharpens this point (but which isn’t in the present version of our preprint) is presented below.

Let’s consider the special case in which either (Alice samples by flipping a fair coin) or (Alice’s samples are either all or all ), and Bob’s task is to determine which of these is the case. Is Alice sending random bits, or are her samples all the same? Say that the reality is . Bob’s null-hypothesis is that and his alternative hypothesis is that . He conducts the likelihood ratio test in an attempt to rule out the null-hypothesis. So he computes the likelihood of the null hypothesis divided by the likelihood of the alternative hypothesis- if the ratio is zero then the test succeeds, and if it is one then the test fails.

A computation shows that the likelihood ratio in this case is asymptotically , so that the test succeeds in the limit (and Bob can infer the value of ) if and only if Tsirelson’s bound is violated (and remember that the channel is disconnected in this case).

Our result is related to a known criterion called Information Causality. Is it sufficiently novel compared to this criterion? I can’t express an opinion… “sufficiency of novelty” isn’t well-defined. Below I describe the relationship of our work with Information Causality.

The original paper which formulated Tsirelson’s bound in terms of information was Information Causality as a Physical Principle by Pawlowski, Paterek, Kaszlikowski, Scarani, Winter, and Zukowski. In the same context as the one we work in, that paper formulates a principle called Information Causality, which roughly states that the maximum information Bob can have about Alice’s bits is because he was only sent one bit by Alice (the bit ). So Bob can infer at most the amount of information in bits that Alice actually classically sent him. Nonlocality cannot be used to construct a superluminal telegraph.

Here is a rough formulation of Information Causality:

Information Causality:The amount of information potentially available to Bob about Alice’s bits is bounded above by the number of bits Alice sends to Bob through a classical channel.

And here, beside it, is a formulation of the principle we would like to suggest instead.

Statistical No-Signaling:No information can pass through a channel whose output is independent of its input (Case I).

This is equivalent to Tsirelson’s bound via the case of our experiment described above. Namely, the channel correlation converges to zero (so the channel disconnects at infinity) while the Fisher information stays bounded (in fact stays between zero and one) if and only if Tsirelson’s bound holds.

The “Information Causality quantity” is the mutual information of and . The result of that paper is that violation of Tsirelson’s bound allows violation of Information Causality. But not the converse- they cannot prove that violation of Information Causality implies violation of Tsirelson’s bound.

Concretely, the information causality quantity is less than or equal to a term which we interpret as Fisher information, which is less than or equal to if and only if Tsirelson’s bound holds.

In the appendix to their paper:

In what sense, then, is our result more than a trivial restatement of Information Causality? Well, first of all, it’s technically a different mathematical result. It implies information causality (in the context of the protocols we both consider, anyway) but is not implied by it in any obvious way.

Perhaps I can argue that Statistical No-Signaling is more fundamental. Information causality involves a whole Alice-and-Bob story (and I’m not sure how to formulate it rigourously mathematically), whereas Statistical No-Signaling is a general statistical statement- if and are independent random variables then you cannot learn by sampling a countable number of times (in oblivious transfer, is constructed via a limiting construction). Mathematically you can (if Tsirelson’s bound were violated then by oblivious transfer), but the statement is that Physically you can’t. It suggests that although we use words like Locality and No-Signaling, perhaps they shouldn’t mean what we think they mean.

Our `story’ is different also. We’ve interpreted the intermediate term as Fisher information so that the task we are discussing is a statistical inference task as opposed to measuring mutual information between two strings. So, for us, Tsirelson’s bound is related to the Central Limit Theorem, by which we can characterize the convergence the sample mean of to as grows. Fundamental physics relates to fundamental statistics. Aesthetically, I like that.

Because the CLT is so mathematically fundamental, many criteria can be formulated that will follow from it. Actually I think that Statistical No-Signaling might philosophically be closer to Macroscopic Locality than to Information Causality, because we’re saying that a physical system `becomes local’ as the number of boxes grows to infinity. I don’t know how to rigourously derive Macroscopic Locality from our work though.

Still on the `story’ front, Fisher information gives us the 3-way division discussed at the beginning of this post, interpreting the three relevant ranges of the Bell-CHSH parameter . If there is no channel. If there is a channel but its output is indistiguishable from noise. If then there is communication through a channel. Information causality doesn’t interpret the different ranges in any meaningful way to the best of my understanding.

In addition, functionally, we have a thought experiment with a binary outcome which succeeds if and only if Tsirelson’s bound is violated, and I’m not sure how you could do that with Information Causality.

So why would I (D.M.) be looking at physics-y things like this?

Well, personally I think that the fundamental laws of nature ought to be distributive and nonassociative (this is an irrational bias, I know). One thing that this implies to me is that we should attempt to work as much as possible at the level of measures of information (*e.g.* Fisher information) rather than working in terms of vectors, functions, strings, and so on. We’ve worked on understanding information flow in a low dimensional topological context in previous work, such as arXiv:1409.5505.

I’d like to suggest the philosophical idea that joint distributions are largely a fiction. We can’t meaningfully speak about joint distributions of distant objects we can’t instantaneously compare. But marginals- conditional probabilities- are real. Nonlocality is a setting in which that is how things work. Estimators based on conditional probabilities behave like quandle elements (that’s in arXiv:1409.5505), so my dream is to link this all up to low-dimesional topology. But I’m not there yet.

If you want to know more, please read the preprint. Feedback is welcome!!

]]>