VAE = EM
This blog post observes that the expectation maximization algorithm (EM) is exactly alternating maximization of the variational autoencoder (VAE) objective function provided that the VAE encoder is sufficiently expressive. This observation is mathematically straightforward but I have not seen it … Continue reading
Deep Meaning Beyond Thought Vectors
I ended my last post by saying that I might write a followup post on current work that seems to exhibit progress toward natural language understanding. I am going to discuss a couple sampled papers but of course these … Continue reading
The Plausibility of NearTerm Machine Sentience.
When should we expect “operational sentience” — the point where the most effective way to interact with a machine is to assume it is sentient — to assume that it understands what we tell it. I want to make an … Continue reading
Formalism, Platonism and Mentalese
This is a sequel to an earlier post on Tarski and Mentalese. I am writing this sequel for two reasons. First, I just posted a new version of my treatment of type theory which focuses on “naive semantics”. I want to explain … Continue reading
Comprehension Based Language Modeling
One of the holy grails of the modern deep learning community is to develop effective methods for unsupervised learning. I have always held out hope that the semantics of English could be learned from raw unlabeled text. The plausibility of a … Continue reading
Cognitive Architectures
Within the deep learning community there is considerable interest in neural architecture. There are convolutional networks, recurrent networks, LSTMs, GRUs, attention mechanisms, highway networks, inception networks, residual networks, fractal networks and many more. Most of these architectures can be viewed as certain feedforward circuit … Continue reading
Architectures and Language Instincts
This post is inspired by a recent book and article by Vyvyan Evans declaring the death of the PinkerChomsky notion of a language instinct. I have written previous posts on what I call the ComskyHinton axis of innateknowledge vs. general learning and … Continue reading
