Within the deep learning community there is considerable interest in neural architecture. There are convolutional networks, recurrent networks, LSTMs, GRUs, attention mechanisms, highway networks, inception networks, residual networks, fractal networks and many more. Most of these architectures can be viewed as certain feed-forward circuit topologies. Circuits are a universal model of computation. However, human programmers find it more productive to specify algorithms in high level programing languages. Presumably this applies to learning as well — learning should be easier with a higher level architecture.

Of course the deep community is aware of the relationship between neural architecture and models of computation. General “high level” architectures have been proposed. We have neural turing machines (actually random access machines), parsing architectures with stacks, and neural architectures for functional programming. There was a nice NIPS workshop on reasoning, attention and memory (RAM) addressing such fundamental architectural issues.

It seems reasonable to use classical models of computation as inspiration for neural architectures. But it is important to be aware of the large variety of classical architectural ideas. Various twentieth century discrete architectures may provide a rich source of inspiration for twenty first century differentiable architectures. Here is a list my favorite classical architectural ideas.

**Mathematical Logic:** Starting with the ancient Greeks, logic has been developed directly as a model of knowledge representation and thought. Mathematical logic organizes knowledge around entities and relations. Databases are closely related to predicate calculus. Logic is capable of representing knowledge in any domain of discourse. While entities and relations are central to logic, logic involves a variety of additional features such as function application, quantification, and types. Logic also provides the intellectual framework underlying mathematics. Achieving the singularity will presumably require machines to be capable of programming computers. Computer programming seems to require sound analytical (mathematical) reasoning.

**Production Systems and Logic Programming:** This style of architecture was championed by Herb Simon and Alan Newel. It is a way of making logical rules compute efficiently. I will interpret production systems fairly broadly to include various rule-based languages such as SOAR, Ops5, Prolog and Datalog. The cleanest of this family of architectures is bottom-up logic programming which has a nice relationship to general dynamic programming and is the foundation of the Dyna programming langauge. Dynamic programming algorithms can be viewed as feed-forward networks where each entry in a dynamic programming table can be viewed as a structured-output unit which computes its values form earlier units and provides its value to later units.

**Inductive Logic Programming: **This is a classical unification of machine learning and logic programming championed by Stephen Muggleton. The basic idea is take a set of assertions in predicate calculus (observed data) and generalize them to a “theory” (a logic program) that is consistent and that implies the data.

**Frames, Scripts, and Object-Oriented Programming:** Frames and scripts were championed as a general framework for knowledge representation by Marvin Minsky , Roger Schank, and Charles Fillmore. Frames are related to object-oriented programming in the sense that an instance of the “room frame” (or room class) has fillers for fields such as “ceiling”, “windows” and “furniture”. Frames also seem related to the ontology of mathematics. For example, a mathematical field consists of a set together with two operations (addition and multiplication) satisfying certain properties. The term “structure” has a well defined meaning in model theory (a branch of logic) which is closely related to the notion of a class instance in object-oriented programming. A specific mathematical field is a structure (in the technical sense) and is an instance of the general mathematical class of fields.

The Situation Calculus and Modal Logic: In the situation calculus statements take meaning in”situations”. A “fluent” is a mapping from situations to truth values. Actions change one situation into another. This leads to the STRIPS model of actions and planning. Situations are closely related to the possible worlds of modal logic.

Monads: Monads generalize the relationship between pure (stateless) functional programming, as in the programming language Haskel, and the more familiar effect-based programming as in C or C++ where assignment statements change the state of the computation. The mapping (or compilation) from an effect-driven program to a pure (stateless) program defines the state monad. There are different monads. The state monad treats each action as a mapping from an input state to an output state. The power set monad (or non-determinism monad) treats each action as a mapping from a set of states to a set of states. The probability monad treats each action as a mapping from a probability distribution to a probability distribution. The probability monad gives rise to probabilistic programming languages. There are also more esoteric monads such as the CPS monad which treats each action as mapping a state of the stack to a state of the stack thereby converting recursion to iteration. In a pure language such as Haskel the use of a monad to suppress a state argument typically makes code more readable. There also seem to be a relationship between the states of the common monads and the situations or possible worlds of the situation calculus and modal logic.

## Conclusion

I believe that human learning is based on a differentiable universal learning architecture and that domain specific priors are not required. But it is unclear how elaborate the general architecture is. It seems worth considering the above list of classical architectural ideas and the possibility that these discrete architectures can be made differentiable.