## RL and the Game of Mathematics

I recently gave a lecture on Alphago in a deep learning class.  I had of course heard of the recent accomplishments — learning in a matter of days, or even hours, and entirely from self-play, to become superhuman in go, chess and shogi.  I had not, however, appreciated the magnitude of this advance until I read the papers (Alphgo Zero and AlphaZero). Alphago Zero is about 1400 ELO points above Alphago Lee — the program that defeated Lee Sedol. The ELO scale is the one used to rate chess players — 200 ELO points corresponds to 75% chance of victory by the stronger player.  Measuring a huge gap like 1400 points requires using a sequence of versions of the program of increasing strength.  Also, Alphago Zero runs on a single machine with four GPUs.  Alphago Lee runs on a supercomputer.  Also, the algorithm underlying Alphago Lee is new and simpler with no rollouts or hand designed features.  Also, the same algorithm does the same trick in chess and shogi — quickly learning to be far superior to any previous program for these games.  Also, the introduction of a fundamentally new algorithm seems likely to open up yet more progress in RL. The new algorithm can be viewed as tree-search bootstrapping  applied to policy gradient.

Ten years ago Jonathan Schaeffer at the University of Alberta proved that checkers is a draw.  His group computed and stored full drawing strategies for each player.  This was listed by Science magazine as one of the ten greatest breakthroughs of 2007. The game of chess is also believed to be a draw, although computing full drawing strategies is completely infeasible.   Empirically, as the level of play increases draws become more common.  Computers have been super-human in chess for the last 20 years.  The prevalence of draws at super-human levels causes the ELO scale to break down. (In practice go games are never drawn.) The strongest computer program, stockfish, was thought to be unbeatable — equivalent to perfect play.  AlphaZero defeated Stockfish 25/50 games playing from white (and lost none) and defeated Stockfish 3/50 games from black (and lost none).

DeepMind’s success in RL is based on perfect simulation.  Perfect simulation is possible in Atari games and in board games such as go and chess.  Perfect simulation allows large-scale on-policy learning.  Large-scale on-policy learning is not feasible in the “game” of human dialogue (the Turing test).  We don’t know how to accurately simulate human dialogue.  Mathematics, on the other hand, can be formulated as a game with rules defined by a formal foundational logic.  This immediately raises the question of whether the recent dramatic results in RL for games might soon lead to super-human open-domain mathematicians.

The foundations of mathematics has been largely ignored by AI researchers.  I have always felt that the foundations — the rules of “correct thought” — reflect an innate cognitive architecture.  But independent of such beliefs, it does seem possible to formulate inference rules defining the general notion of proof accepted by the mathematics community.  For this we have ZFC or type theory.  More on this later.  But in addition to rules defining moves, we need an objective— a reward function.  What is the objective of mathematics?

I believe that a meaningful quantitative objective for mathematics must involve an appropriate formal notion of a mathematical ontology — a set of concepts and statement made in terms of them. It is easy to list off  basic mathematical concepts — groups, rings, fields, vector spaces, Banach spaces, Hilbert spaces, manifolds, Lie algebras, and many more.  For some concepts, such as the finite simple groups or the compact two manifolds, complete classification is possible — it is possible to give a systematic enumeration of the instances of the concept up to isomorphism. Formulating an objective in terms of a concept ontology requires, of course, being clear (formal) about what we mean by a “concept”.  This brings us to the importance of type theory.

Since the 1920’s (or certainly the 1950s) the mathematics community has generally accepted Zermello-Fraenkel set theory with the axiom of choice (ZFC) as the formal foundation.  However, the ontology of mathematics is clearly organized around the apparent fact that the instances of a concept are only classified up to the notion of isomorphism associated with that concept. The classification of finite simple groups or compact two manifolds is done up to isomorphism. The fact that each concept comes with an associated notion of isomorphism has never been formalized in the framework of ZFC.  To formalize concepts and their associated notion of isomorphism we need type theory.

I have been working for years on a set-theoretic type theory — a type theory fully compatible with the ZFC foundations.  Alternatively there is constructive type theory and its extension to homotopy type theory (HoTT).  I will not discuss the pros and cons of different type theories here except to say that I expect mathematicians will be more accepting of a type theory fully compatible with the grammar of informal mathematical language as well as being compatible with the currently accepted content of mathematics.

Type theory gives a set of formal rules defining the “well-formed” concepts and statements.  I believe that a meaningful objective for mathematics can then be defined by an a-prior probability distribution over type-theoretically well-formed mathematical questions.  The well-formedness constraint ensures that each question is meaningful. We do not want to ask if 2 is a member of 3. The objective is to be able to answer the largest possible fraction of these well-formed questions.

Of course the questions under investigation in human mathematics evolve over time and this evolution is clearly governed by a social process.  But I claim that this is consistent with the existence of an objective notion of mathematical significance.  A socially determined distribution over questions can be importance-corrected (in the statistical sense) in judging objective significance. The use of a sampling distribution different from the prior, which is then importance corrected, seems likely to be useful in any case.

Of course I am not the only one to be considering deep learning for mathematical inference.  There have now been three meetings of the Conference on AI and Theorem Proving.  Many of the papers in this conference propose neural architectures for “premise selection” — the problem of selecting facts from a mathematical library that are likely to be useful for a given problem.  Christian Szegedy, a co-author of batch-normalization and a principal developer of Google’s inception network for image classification, has been attending this conference and working on deep learning for theorem proving [1,2,3].  I myself have attended the last two meetings and have found them useful in framing my own thinking on deep architectures for inference.

In summary, DeepMind’s dramatic recent success in RL raises the question of whether RL can produce a super-human player of the game of mathematics.  I hereby throw this out as a challenge to DeepMind and to the AI community generally.  A super-human open-domain mathematician would seem a major step toward AGI.

This entry was posted in Uncategorized. Bookmark the permalink.

### 6 Responses to RL and the Game of Mathematics

1. pcptheorem says:

One big difference between an AlphaGo Zero type setup and theorem proving I feel is that in the former, you can “incrementally” improve your AI. In particular, since you’re engaged in self-play, pretty much at every step you get a signal as to whether a particular move is good or not based on whether your AI can then make a move for the opposing player that puts it in a better position (better position being defined by the valuation function of each state which is the same for both players).

If on the other hand, we were training AlphaGo with the opposing player being a ridiculously good black box player, I’m not sure it would be as easy to train it up as almost every move would be giving you pretty much negative reward signal.

With theorem proving on the other hand, you only get a positive reward signal when you’ve actually proved the theorem. Maybe it’s better to focus on then on propositions where one can get the positive reward signal quicker — for example for propositions in number theory, one could probably try and prove theorems for individual numbers instead of the entire set of integers/reals?

• McAllester says:

Yes, “curriculum learning” (easy problems first) seems called for. This is related to the use of a sampling distribution different from the prior that defines the objective. Using a sampling distribution different from the distribution of interest is known as “importance sampling” in statistics and machine learning. But making this work in a meaningful way does seem nontrivial.

2. tdietterich says:

I don’t see the connection to AGI, but a super-human open-domain mathematician would be super-human open-domain mathematician, and that would be incredibly valuable.

3. McAllester says:

Relevance to AGI depends on how “general” one takes open-domain mathematics to be and how one feels about the relationship between “intelligence” and mathematical ability. But an open-domain pure mathematician would not be aware of reality (I have known some people like that). That might be a good thing.

4. nycandre says:

This may sound simplistic, but how about starting with some really basic math games – like geometry, uh Cartesian, 2 dimensional – you know, with compass, and rulers .. there are a plethora of theorems to (re) discover, let’s see in AI can come up with new ones, or at least new proofs. The reward function … besides Occam’s razor, not sure yet ..

• McAllester says:

Automated geometry theorem proving goes back to the 60s and became very highly developed by the 90s. But it is domain specific. I am interested in general foundations and in particular foundations based on dependent type theory. Dependent type theory seems to capture the foundations of all mathematics and I believe that it is a reflection of some aspect of innate cognitive architecture. I want to focus on the G in AGI.